R中dcast的一个特例
A special case of dcast in R
我的问题看起来很简单,我确实为我无法使其工作而感到非常恼火。假设我有一个简单的 dataframe
,其中一列用于 group
和一个变量 x
。因为我的变量组包含一个 "control" 条件,所以我想 运行 我所有其他条件的 t.test
针对我的控制变量。
library(data.table) # I am use to the data.table sintax, tho I will happily accept a solution in any other dialect
# Generate dummy data
set.seed(1)
df <- data.table(x = rnorm(100), g = sample(LETTERS[1:3], size = 100, replace =T ))
setkey(df, g, x) # Order
df # Inspect data
为此,我想 dcast
控制组并将其添加为新列。由于我想要的是 运行 一个 t 检验,为此,我将使用整个组,所以我不介意列的包含顺序。但是,我用来从长格式更改为宽格式 (dcast
) 的函数在这里似乎不起作用。
# dcast appoach
m <- dcast(df, x ~ g) # This is just... B*#!!it
所以这里是我寻找的近似值:
# Kind of what I want
# Isolate control condition
Control <- df[g == "C"]
df[, C := rep(Control, 3)] # In this case it says there a "remainder", tho I would prefer to add NAs to the variable x until completion
我也不介意将所有组 A、B 和 C 作为列。
提前感谢您的帮助
也许,这可能是 OP 要求的:
library(data.table)
dcast(df, rowid(g) ~ g, value.var = "x")
g A B C
1: 1 -1.804958629 -1.98935170 -2.21469989
2: 2 -1.470752384 -1.52356680 -0.74327321
3: 3 -1.276592208 -1.37705956 -0.62124058
4: 4 -1.253633400 -1.12936310 -0.61202639
5: 5 -1.224612615 -1.04413463 -0.58952095
6: 6 -0.934097632 -0.83562861 -0.47340064
7: 7 -0.709946431 -0.82046838 -0.41499456
8: 8 -0.707495157 -0.68875569 -0.39428995
9: 9 -0.626453811 -0.47815006 -0.30538839
10: 10 -0.573265414 -0.25336168 -0.13505460
11: 11 -0.568668733 -0.13517862 0.02800216
12: 12 -0.542520031 -0.11234621 0.39810588
13: 13 -0.443291873 -0.05931340 0.41794156
14: 14 -0.367221476 -0.05612874 0.55848643
15: 15 -0.304183924 -0.05380504 0.61982575
16: 16 -0.164523596 -0.01619026 0.69696338
17: 17 -0.155795507 0.07434132 0.82122120
18: 18 -0.102787727 0.15325334 0.88110773
19: 19 -0.044933609 0.34111969 0.94383621
20: 20 -0.039240003 0.36458196 1.12493092
21: 21 0.001105352 0.38767161 1.16040262
22: 22 0.074564983 0.48742905 1.17808700
23: 23 0.183643324 0.56971963 1.46555486
24: 24 0.188792300 0.59390132 1.51178117
25: 25 0.267098791 0.61072635 NA
26: 26 0.291446236 0.76317575 NA
27: 27 0.329507772 1.10002537 NA
28: 28 0.332950371 1.35867955 NA
29: 29 0.370018810 1.43302370 NA
30: 30 0.389843236 1.58683345 NA
31: 31 0.475509529 2.40161776 NA
32: 32 0.556663199 NA NA
33: 33 0.575781352 NA NA
34: 34 0.593946188 NA NA
35: 35 0.689739362 NA NA
36: 36 0.700213650 NA NA
37: 37 0.738324705 NA NA
38: 38 0.768532925 NA NA
39: 39 0.782136301 NA NA
40: 40 0.918977372 NA NA
41: 41 1.063099837 NA NA
42: 42 1.207867806 NA NA
43: 43 1.595280802 NA NA
44: 44 1.980399899 NA NA
45: 45 2.172611670 NA NA
g A B C
这是通过为每个组人为引入单独的行数 rowid(g)
来实现的。
然而,根据,我不明白这将如何帮助解决 OP 的潜在问题。
我的问题看起来很简单,我确实为我无法使其工作而感到非常恼火。假设我有一个简单的 dataframe
,其中一列用于 group
和一个变量 x
。因为我的变量组包含一个 "control" 条件,所以我想 运行 我所有其他条件的 t.test
针对我的控制变量。
library(data.table) # I am use to the data.table sintax, tho I will happily accept a solution in any other dialect
# Generate dummy data
set.seed(1)
df <- data.table(x = rnorm(100), g = sample(LETTERS[1:3], size = 100, replace =T ))
setkey(df, g, x) # Order
df # Inspect data
为此,我想 dcast
控制组并将其添加为新列。由于我想要的是 运行 一个 t 检验,为此,我将使用整个组,所以我不介意列的包含顺序。但是,我用来从长格式更改为宽格式 (dcast
) 的函数在这里似乎不起作用。
# dcast appoach
m <- dcast(df, x ~ g) # This is just... B*#!!it
所以这里是我寻找的近似值:
# Kind of what I want
# Isolate control condition
Control <- df[g == "C"]
df[, C := rep(Control, 3)] # In this case it says there a "remainder", tho I would prefer to add NAs to the variable x until completion
我也不介意将所有组 A、B 和 C 作为列。
提前感谢您的帮助
也许,这可能是 OP 要求的:
library(data.table)
dcast(df, rowid(g) ~ g, value.var = "x")
g A B C 1: 1 -1.804958629 -1.98935170 -2.21469989 2: 2 -1.470752384 -1.52356680 -0.74327321 3: 3 -1.276592208 -1.37705956 -0.62124058 4: 4 -1.253633400 -1.12936310 -0.61202639 5: 5 -1.224612615 -1.04413463 -0.58952095 6: 6 -0.934097632 -0.83562861 -0.47340064 7: 7 -0.709946431 -0.82046838 -0.41499456 8: 8 -0.707495157 -0.68875569 -0.39428995 9: 9 -0.626453811 -0.47815006 -0.30538839 10: 10 -0.573265414 -0.25336168 -0.13505460 11: 11 -0.568668733 -0.13517862 0.02800216 12: 12 -0.542520031 -0.11234621 0.39810588 13: 13 -0.443291873 -0.05931340 0.41794156 14: 14 -0.367221476 -0.05612874 0.55848643 15: 15 -0.304183924 -0.05380504 0.61982575 16: 16 -0.164523596 -0.01619026 0.69696338 17: 17 -0.155795507 0.07434132 0.82122120 18: 18 -0.102787727 0.15325334 0.88110773 19: 19 -0.044933609 0.34111969 0.94383621 20: 20 -0.039240003 0.36458196 1.12493092 21: 21 0.001105352 0.38767161 1.16040262 22: 22 0.074564983 0.48742905 1.17808700 23: 23 0.183643324 0.56971963 1.46555486 24: 24 0.188792300 0.59390132 1.51178117 25: 25 0.267098791 0.61072635 NA 26: 26 0.291446236 0.76317575 NA 27: 27 0.329507772 1.10002537 NA 28: 28 0.332950371 1.35867955 NA 29: 29 0.370018810 1.43302370 NA 30: 30 0.389843236 1.58683345 NA 31: 31 0.475509529 2.40161776 NA 32: 32 0.556663199 NA NA 33: 33 0.575781352 NA NA 34: 34 0.593946188 NA NA 35: 35 0.689739362 NA NA 36: 36 0.700213650 NA NA 37: 37 0.738324705 NA NA 38: 38 0.768532925 NA NA 39: 39 0.782136301 NA NA 40: 40 0.918977372 NA NA 41: 41 1.063099837 NA NA 42: 42 1.207867806 NA NA 43: 43 1.595280802 NA NA 44: 44 1.980399899 NA NA 45: 45 2.172611670 NA NA g A B C
这是通过为每个组人为引入单独的行数 rowid(g)
来实现的。
然而,根据