R data.table 中行子集的列均值
column mean for subset of rows in R data.table
我有一个 data.table 这样的:
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)
我需要计算多天的平均值,但 id 之间的平均值不同。
为了获得第 1 天到第 2 天的平均值,我尝试了以下方法(受 启发):
example[id==1, mean_over_days := mean(mean_day,na.rm=T), by=(day %in% 1:2)]
> example
id day time mean_day mean_over_days
1: 1 1 1 -1.53685184 -0.8908466
2: 1 1 2 0.77445521 -0.8908466
3: 1 1 3 -0.56048917 -0.8908466
4: 1 2 1 -1.78388960 -0.8908466
5: 1 2 2 0.01787129 -0.8908466
6: 1 2 3 -2.25617538 -0.8908466
7: 1 3 1 -0.44886190 -0.0955414
8: 1 3 2 -1.31086985 -0.0955414
9: 1 3 3 1.47310754 -0.0955414
10: 2 1 1 0.53560356 NA
11: 2 1 2 1.16654996 NA
12: 2 1 3 -0.06704728 NA
13: 2 2 1 -0.83897719 NA
14: 2 2 2 -0.85209939 NA
15: 2 2 3 -0.41392341 NA
16: 2 3 1 -0.03014190 NA
17: 2 3 2 0.43835822 NA
18: 2 3 3 -1.62432188 NA
我希望第 mean_over_days 列的 id==1 的所有行都具有相同的值 (-0.8908466
),但碰巧第 3 天该列具有当天的平均值只有.
我如何更改代码以更正此问题?
不要在 by
中创建子集,它会为 TRUE
和 FALSE
值创建不同的组。
library(data.table)
example[id==1, mean_over_days := mean(mean_day[day %in% 1:2],na.rm=TRUE)][]
# id day time mean_day mean_over_days
# 1: 1 1 1 -0.56047565 0.4471527
# 2: 1 1 2 -0.23017749 0.4471527
# 3: 1 1 3 1.55870831 0.4471527
# 4: 1 2 1 0.07050839 0.4471527
# 5: 1 2 2 0.12928774 0.4471527
# 6: 1 2 3 1.71506499 0.4471527
# 7: 1 3 1 0.46091621 0.4471527
# 8: 1 3 2 -1.26506123 0.4471527
# 9: 1 3 3 -0.68685285 0.4471527
#10: 2 1 1 -0.44566197 NA
#11: 2 1 2 1.22408180 NA
#12: 2 1 3 0.35981383 NA
#13: 2 2 1 0.40077145 NA
#14: 2 2 2 0.11068272 NA
#15: 2 2 3 -0.55584113 NA
#16: 2 3 1 1.78691314 NA
#17: 2 3 2 0.49785048 NA
#18: 2 3 3 -1.96661716 NA
数据
set.seed(123)
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)
我有一个 data.table 这样的:
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)
我需要计算多天的平均值,但 id 之间的平均值不同。
为了获得第 1 天到第 2 天的平均值,我尝试了以下方法(受
example[id==1, mean_over_days := mean(mean_day,na.rm=T), by=(day %in% 1:2)]
> example
id day time mean_day mean_over_days
1: 1 1 1 -1.53685184 -0.8908466
2: 1 1 2 0.77445521 -0.8908466
3: 1 1 3 -0.56048917 -0.8908466
4: 1 2 1 -1.78388960 -0.8908466
5: 1 2 2 0.01787129 -0.8908466
6: 1 2 3 -2.25617538 -0.8908466
7: 1 3 1 -0.44886190 -0.0955414
8: 1 3 2 -1.31086985 -0.0955414
9: 1 3 3 1.47310754 -0.0955414
10: 2 1 1 0.53560356 NA
11: 2 1 2 1.16654996 NA
12: 2 1 3 -0.06704728 NA
13: 2 2 1 -0.83897719 NA
14: 2 2 2 -0.85209939 NA
15: 2 2 3 -0.41392341 NA
16: 2 3 1 -0.03014190 NA
17: 2 3 2 0.43835822 NA
18: 2 3 3 -1.62432188 NA
我希望第 mean_over_days 列的 id==1 的所有行都具有相同的值 (-0.8908466
),但碰巧第 3 天该列具有当天的平均值只有.
我如何更改代码以更正此问题?
不要在 by
中创建子集,它会为 TRUE
和 FALSE
值创建不同的组。
library(data.table)
example[id==1, mean_over_days := mean(mean_day[day %in% 1:2],na.rm=TRUE)][]
# id day time mean_day mean_over_days
# 1: 1 1 1 -0.56047565 0.4471527
# 2: 1 1 2 -0.23017749 0.4471527
# 3: 1 1 3 1.55870831 0.4471527
# 4: 1 2 1 0.07050839 0.4471527
# 5: 1 2 2 0.12928774 0.4471527
# 6: 1 2 3 1.71506499 0.4471527
# 7: 1 3 1 0.46091621 0.4471527
# 8: 1 3 2 -1.26506123 0.4471527
# 9: 1 3 3 -0.68685285 0.4471527
#10: 2 1 1 -0.44566197 NA
#11: 2 1 2 1.22408180 NA
#12: 2 1 3 0.35981383 NA
#13: 2 2 1 0.40077145 NA
#14: 2 2 2 0.11068272 NA
#15: 2 2 3 -0.55584113 NA
#16: 2 3 1 1.78691314 NA
#17: 2 3 2 0.49785048 NA
#18: 2 3 3 -1.96661716 NA
数据
set.seed(123)
example <- data.table(
id=rep(1:2,each=9),
day=rep(rep(1:3,each=3),2),
time=rep(1:3,6),
mean_day=rnorm(18)
)