Excel Pivot-table 类似 R 的功能(plyr,dplyr?)
Excel Pivot-table like functionality with R (plyr, ddplyr?)
我希望使用 R 比 Excel 更快地创建枢轴 tables(并减少错误空间。
例如,如果我有这样的数据集:
id<-c("p","q","r","s","t","u","p","q","r","s","t","u")
time<-c(0,0,0,0,0,0,1,1,1,1,1,1)
foldchange<-rnorm(12)
log2foldchange<-rnorm(12)
p.value<-rnorm(12)
df<-data.frame(id,time,foldchange,log2foldchange,p.value)
我想像在 excel 中一样使用枢轴 table 对 table 进行排序,使其看起来像这样(或尽可能接近):
有什么想法吗?无法从此处的示例中弄清楚如何执行此操作(或任何类似的操作)。
谢谢!
使用data.table v1.9.5
,这很简单:
require(data.table) # v1.9.5+
dcast(setDT(df), id ~ time, value.var = names(df)[3:5])
PS:我假设 p 值只是为了这里。因为它们是 -ve/>1。你应该从均匀分布中生成随机值。
如果您要为示例生成随机数,您应该set.seed
set.seed(1)
id<-c("p","q","r","s","t","u","p","q","r","s","t","u")
time<-c(0,0,0,0,0,0,1,1,1,1,1,1)
foldchange<-rnorm(12)
log2foldchange<-rnorm(12)
p.value<-rnorm(12)
df<-data.frame(id,time,foldchange,log2foldchange,p.value)
reshape(df, dir = 'wide', idvar = 'id', timevar = 'time')
# id foldchange.0 log2foldchange.0 p.value.0 foldchange.1 log2foldchange.1 p.value.1
# 1 p -0.6264538 -0.62124058 0.61982575 0.4874291 0.82122120 1.35867955
# 2 q 0.1836433 -2.21469989 -0.05612874 0.7383247 0.59390132 -0.10278773
# 3 r -0.8356286 1.12493092 -0.15579551 0.5757814 0.91897737 0.38767161
# 4 s 1.5952808 -0.04493361 -1.47075238 -0.3053884 0.78213630 -0.05380504
# 5 t 0.3295078 -0.01619026 -0.47815006 1.5117812 0.07456498 -1.37705956
# 6 u -0.8204684 0.94383621 0.41794156 0.3898432 -1.98935170 -0.41499456
或者干脆
reshape(df, dir = 'wide')
# id foldchange.0 log2foldchange.0 p.value.0 foldchange.1 log2foldchange.1 p.value.1
# 1 p -0.6264538 -0.62124058 0.61982575 0.4874291 0.82122120 1.35867955
# 2 q 0.1836433 -2.21469989 -0.05612874 0.7383247 0.59390132 -0.10278773
# 3 r -0.8356286 1.12493092 -0.15579551 0.5757814 0.91897737 0.38767161
# 4 s 1.5952808 -0.04493361 -1.47075238 -0.3053884 0.78213630 -0.05380504
# 5 t 0.3295078 -0.01619026 -0.47815006 1.5117812 0.07456498 -1.37705956
# 6 u -0.8204684 0.94383621 0.41794156 0.3898432 -1.98935170 -0.41499456
很直接,对@data.table?
以及不太直观的 dplyr
和 tidyr
library(dplyr); library(tidyr)
df %>% gather(name, value, c(-id, -time)) %>% mutate(new=paste(name, time, sep=".")) %>%
select(-time, -name) %>% spread(new, value)
逻辑如下:
将 foldchange
的数据转置为 p.value
这是通过代码 df %>% gather(name, value, c(-id, -time))
.
完成的
接下来在 excel 中连接您想要的变量作为 column labels
这是通过 mutate(new=paste(name, time, sep="."))
部分完成的
最后通过 spread(new, value)
转置串联变量,首先选择您感兴趣的列。
根据您对它们的排序方式(列),您也可以尝试
df %>% gather(name, value, c(-id, -time)) %>% mutate(new=paste(time, name, sep=".")) %>%
select(-time, -name) %>% spread(new, value)
区别是mutate(new=paste(time, name, sep="."))
我希望使用 R 比 Excel 更快地创建枢轴 tables(并减少错误空间。
例如,如果我有这样的数据集:
id<-c("p","q","r","s","t","u","p","q","r","s","t","u")
time<-c(0,0,0,0,0,0,1,1,1,1,1,1)
foldchange<-rnorm(12)
log2foldchange<-rnorm(12)
p.value<-rnorm(12)
df<-data.frame(id,time,foldchange,log2foldchange,p.value)
我想像在 excel 中一样使用枢轴 table 对 table 进行排序,使其看起来像这样(或尽可能接近):
有什么想法吗?无法从此处的示例中弄清楚如何执行此操作(或任何类似的操作)。
谢谢!
使用data.table v1.9.5
,这很简单:
require(data.table) # v1.9.5+
dcast(setDT(df), id ~ time, value.var = names(df)[3:5])
PS:我假设 p 值只是为了这里。因为它们是 -ve/>1。你应该从均匀分布中生成随机值。
如果您要为示例生成随机数,您应该set.seed
set.seed(1)
id<-c("p","q","r","s","t","u","p","q","r","s","t","u")
time<-c(0,0,0,0,0,0,1,1,1,1,1,1)
foldchange<-rnorm(12)
log2foldchange<-rnorm(12)
p.value<-rnorm(12)
df<-data.frame(id,time,foldchange,log2foldchange,p.value)
reshape(df, dir = 'wide', idvar = 'id', timevar = 'time')
# id foldchange.0 log2foldchange.0 p.value.0 foldchange.1 log2foldchange.1 p.value.1
# 1 p -0.6264538 -0.62124058 0.61982575 0.4874291 0.82122120 1.35867955
# 2 q 0.1836433 -2.21469989 -0.05612874 0.7383247 0.59390132 -0.10278773
# 3 r -0.8356286 1.12493092 -0.15579551 0.5757814 0.91897737 0.38767161
# 4 s 1.5952808 -0.04493361 -1.47075238 -0.3053884 0.78213630 -0.05380504
# 5 t 0.3295078 -0.01619026 -0.47815006 1.5117812 0.07456498 -1.37705956
# 6 u -0.8204684 0.94383621 0.41794156 0.3898432 -1.98935170 -0.41499456
或者干脆
reshape(df, dir = 'wide')
# id foldchange.0 log2foldchange.0 p.value.0 foldchange.1 log2foldchange.1 p.value.1
# 1 p -0.6264538 -0.62124058 0.61982575 0.4874291 0.82122120 1.35867955
# 2 q 0.1836433 -2.21469989 -0.05612874 0.7383247 0.59390132 -0.10278773
# 3 r -0.8356286 1.12493092 -0.15579551 0.5757814 0.91897737 0.38767161
# 4 s 1.5952808 -0.04493361 -1.47075238 -0.3053884 0.78213630 -0.05380504
# 5 t 0.3295078 -0.01619026 -0.47815006 1.5117812 0.07456498 -1.37705956
# 6 u -0.8204684 0.94383621 0.41794156 0.3898432 -1.98935170 -0.41499456
很直接,对@data.table?
以及不太直观的 dplyr
和 tidyr
library(dplyr); library(tidyr)
df %>% gather(name, value, c(-id, -time)) %>% mutate(new=paste(name, time, sep=".")) %>%
select(-time, -name) %>% spread(new, value)
逻辑如下:
将 foldchange
的数据转置为 p.value
这是通过代码 df %>% gather(name, value, c(-id, -time))
.
接下来在 excel 中连接您想要的变量作为 column labels
这是通过 mutate(new=paste(name, time, sep="."))
部分完成的
最后通过 spread(new, value)
转置串联变量,首先选择您感兴趣的列。
根据您对它们的排序方式(列),您也可以尝试
df %>% gather(name, value, c(-id, -time)) %>% mutate(new=paste(time, name, sep=".")) %>%
select(-time, -name) %>% spread(new, value)
区别是mutate(new=paste(time, name, sep="."))