如何将数据 table 中每列的每行应用一个函数,并将其他行作为输入?
How to apply a function per row of a column in a data table with other rows as input?
对于列 "Response" 的每一行,我想检查它下面的 5 行是否有 "Response" 值(即没有 NA),如果有,那么我想计算下面这 5 行的平均值和标准偏差。如果下面那 5 行中的任何一行缺少 "Response"-值(即 NA),那么最终输出应该是 "NA"(因为我希望计算 n= 的均值和标准差5 points/values).
Input.data 的示例如下所示:
Response
NA
1
2
3
NA
1
1
2
3
4
5
这是我试过的代码,没有给出正确的解决方案:
Input.data$count.lag <- rollapplyr(Input.data[,c("Response")],list(-(4:0)),length, fill=NA)
Input.data$stdev <- ifelse(Input.data$count.lag <5, "NA",
rollapplyr(Input.data[,c("Response")],list(-(4:0)),sd,fill=NA))
Input.data$mean <- ifelse(Input.data$count.lag <5, "NA",
rollapplyr(Input.data[,c("Response")],list(-(4:0)),mean,fill=NA))
它给出了以下内容,这不是我想要的:
Response count.lag stdev mean
NA NA NA NA
1 NA NA NA
2 NA NA NA
3 NA NA NA
NA 5 NA NA
1 5 NA NA
1 5 NA NA
2 5 NA NA
3 5 NA NA
4 5 1.303840 2.2
5 5 1.581139 3.0
输出应该是这样的:
Response count.lag stdev mean
NA 4 NA NA
1 4 NA NA
2 4 NA NA
3 4 NA NA
NA 5 1.303840 2.2
1 5 1.581139 3.0
1 5 1.581139 4.0
2 5 1.581139 5.0
3 5 1.581139 6.0
4 5 1.581139 7.0
5 5 1.581139 8.0
有人可以建议错误所在 and/or 可行的替代解决方案吗?谢谢!
可能的方法:
Input[, c("count.lag","stdev","mean") :=
transpose(lapply(1L:.N, function(n) {
x <- Response[(n+1L):min(n+5L, .N)]
c(sum(!is.na(x)), sd(x), mean(x))
}))]
输出:
Response count.lag stdev mean
1: NA 4 NA NA
2: 1 4 NA NA
3: 2 4 NA NA
4: 3 4 NA NA
5: NA 5 1.3038405 2.2
6: 1 5 1.5811388 3.0
7: 1 5 1.5811388 4.0
8: 2 5 1.5811388 5.0
9: 3 5 1.5811388 6.0
10: 4 5 1.5811388 7.0
11: 5 5 1.5811388 8.0
12: 6 4 1.2909944 8.5
13: 7 3 1.0000000 9.0
14: 8 2 0.7071068 9.5
15: 9 1 NA 10.0
16: 10 1 NA NA
数据:
Input <- fread("Response
NA
1
2
3
NA
1
1
2
3
4
5
6
7
8
9
10")
编辑:或根据 MichaelChirico 的建议使用 shift
。结束值不同,取决于 OP 希望如何处理结束值。
#requires data.table version >= 1.12.0 to use negative shifts (else use type='lag' with positive integers
Input[, c("count.lag", "stdev", "mean") :=
.SD[, shift(Response, -1L:-5L)][,
.(apply(.SD, 1L, function(x) sum(!is.na(x))),
apply(.SD, 1L, sd),
apply(.SD, 1L, mean))]
]
输出:
Response count.lag stdev mean
1: NA 4 NA NA
2: 1 4 NA NA
3: 2 4 NA NA
4: 3 4 NA NA
5: NA 5 1.303840 2.2
6: 1 5 1.581139 3.0
7: 1 5 1.581139 4.0
8: 2 5 1.581139 5.0
9: 3 5 1.581139 6.0
10: 4 5 1.581139 7.0
11: 5 5 1.581139 8.0
12: 6 4 NA NA
13: 7 3 NA NA
14: 8 2 NA NA
15: 9 1 NA NA
16: 10 0 NA NA
对于列 "Response" 的每一行,我想检查它下面的 5 行是否有 "Response" 值(即没有 NA),如果有,那么我想计算下面这 5 行的平均值和标准偏差。如果下面那 5 行中的任何一行缺少 "Response"-值(即 NA),那么最终输出应该是 "NA"(因为我希望计算 n= 的均值和标准差5 points/values).
Input.data 的示例如下所示:
Response
NA
1
2
3
NA
1
1
2
3
4
5
这是我试过的代码,没有给出正确的解决方案:
Input.data$count.lag <- rollapplyr(Input.data[,c("Response")],list(-(4:0)),length, fill=NA)
Input.data$stdev <- ifelse(Input.data$count.lag <5, "NA",
rollapplyr(Input.data[,c("Response")],list(-(4:0)),sd,fill=NA))
Input.data$mean <- ifelse(Input.data$count.lag <5, "NA",
rollapplyr(Input.data[,c("Response")],list(-(4:0)),mean,fill=NA))
它给出了以下内容,这不是我想要的:
Response count.lag stdev mean
NA NA NA NA
1 NA NA NA
2 NA NA NA
3 NA NA NA
NA 5 NA NA
1 5 NA NA
1 5 NA NA
2 5 NA NA
3 5 NA NA
4 5 1.303840 2.2
5 5 1.581139 3.0
输出应该是这样的:
Response count.lag stdev mean
NA 4 NA NA
1 4 NA NA
2 4 NA NA
3 4 NA NA
NA 5 1.303840 2.2
1 5 1.581139 3.0
1 5 1.581139 4.0
2 5 1.581139 5.0
3 5 1.581139 6.0
4 5 1.581139 7.0
5 5 1.581139 8.0
有人可以建议错误所在 and/or 可行的替代解决方案吗?谢谢!
可能的方法:
Input[, c("count.lag","stdev","mean") :=
transpose(lapply(1L:.N, function(n) {
x <- Response[(n+1L):min(n+5L, .N)]
c(sum(!is.na(x)), sd(x), mean(x))
}))]
输出:
Response count.lag stdev mean
1: NA 4 NA NA
2: 1 4 NA NA
3: 2 4 NA NA
4: 3 4 NA NA
5: NA 5 1.3038405 2.2
6: 1 5 1.5811388 3.0
7: 1 5 1.5811388 4.0
8: 2 5 1.5811388 5.0
9: 3 5 1.5811388 6.0
10: 4 5 1.5811388 7.0
11: 5 5 1.5811388 8.0
12: 6 4 1.2909944 8.5
13: 7 3 1.0000000 9.0
14: 8 2 0.7071068 9.5
15: 9 1 NA 10.0
16: 10 1 NA NA
数据:
Input <- fread("Response
NA
1
2
3
NA
1
1
2
3
4
5
6
7
8
9
10")
编辑:或根据 MichaelChirico 的建议使用 shift
。结束值不同,取决于 OP 希望如何处理结束值。
#requires data.table version >= 1.12.0 to use negative shifts (else use type='lag' with positive integers
Input[, c("count.lag", "stdev", "mean") :=
.SD[, shift(Response, -1L:-5L)][,
.(apply(.SD, 1L, function(x) sum(!is.na(x))),
apply(.SD, 1L, sd),
apply(.SD, 1L, mean))]
]
输出:
Response count.lag stdev mean
1: NA 4 NA NA
2: 1 4 NA NA
3: 2 4 NA NA
4: 3 4 NA NA
5: NA 5 1.303840 2.2
6: 1 5 1.581139 3.0
7: 1 5 1.581139 4.0
8: 2 5 1.581139 5.0
9: 3 5 1.581139 6.0
10: 4 5 1.581139 7.0
11: 5 5 1.581139 8.0
12: 6 4 NA NA
13: 7 3 NA NA
14: 8 2 NA NA
15: 9 1 NA NA
16: 10 0 NA NA