data.table 不同列的引用更新
data.table update by reference for different columns
我试图将列 trtxxp
和 trtxxa
设置为 NA,而 sdx
是 NA。
虚拟数据:
library(data.table)
dt <- data.table(sd1 = c(1:3, NA, 4:5, NA, 6:10, NA, NA),
sd2 = c(1:5, NA, 6:7, NA, NA, 10:13),
trt01p = "p", trt01a = "a",
trt02p = "p", trt02a = "a")
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p a p a
2: 2 2 p a p a
3: 3 3 p a p a
4: NA 4 p a p a
5: 4 5 p a p a
6: 5 NA p a p a
7: NA 6 p a p a
8: 6 7 p a p a
9: 7 NA p a p a
10: 8 NA p a p a
11: 9 10 p a p a
12: 10 11 p a p a
13: NA 12 p a p a
14: NA 13 p a p a
我知道我可以通过以下几行来实现:
dt[is.na(sd1), `:=`(trt01p = NA,
trt01a = NA)]
dt[is.na(sd2), `:=`(trt02p = NA,
trt02a = NA)]
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p a p a
2: 2 2 p a p a
3: 3 3 p a p a
4: NA 4 <NA> <NA> p a
5: 4 5 p a p a
6: 5 NA p a <NA> <NA>
7: NA 6 <NA> <NA> p a
8: 6 7 p a p a
9: 7 NA p a <NA> <NA>
10: 8 NA p a <NA> <NA>
11: 9 10 p a p a
12: 10 11 p a p a
13: NA 12 <NA> <NA> p a
14: NA 13 <NA> <NA> p a
但我因为有很多列,所以我尝试使用 .SD
、lapply
和 .SDcols
但失败了(只有 trt01p
正确更新)
trt.col <- c("trt01p", "trt01a", "trt02a", "trt02p")
sd.col <- c("sd1", "sd2")
dt[, (trt.col) := lapply(.SD, function(x) ifelse(is.na(x), NA, get(trt.col))),
.SDcols = sort(c(sd.col, sd.col))][]
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p p p p
2: 2 2 p p p p
3: 3 3 p p p p
4: NA 4 <NA> <NA> p p
5: 4 5 p p p p
6: 5 NA p p <NA> <NA>
7: NA 6 <NA> <NA> p p
8: 6 7 p p p p
9: 7 NA p p <NA> <NA>
10: 8 NA p p <NA> <NA>
11: 9 10 p p p p
12: 10 11 p p p p
13: NA 12 <NA> <NA> p p
14: NA 13 <NA> <NA> p p
关于如何做到这一点有什么建议吗?
谢谢。
我认为 MichaelChirico 对 for
循环的建议可能如下所示:
cols <- list(sd1=c("trt01p", "trt01a"), sd2=c("trt02a", "trt02p"))
for (col in names(cols)) set(dt, which(is.na(dt[[col]])), cols[[col]], value = NA)
dt
# sd1 sd2 trt01p trt01a trt02p trt02a
# <int> <int> <char> <char> <char> <char>
# 1: 1 1 p a p a
# 2: 2 2 p a p a
# 3: 3 3 p a p a
# 4: NA 4 <NA> <NA> p a
# 5: 4 5 p a p a
# 6: 5 NA p a <NA> <NA>
# 7: NA 6 <NA> <NA> p a
# 8: 6 7 p a p a
# 9: 7 NA p a <NA> <NA>
# 10: 8 NA p a <NA> <NA>
# 11: 9 10 p a p a
# 12: 10 11 p a p a
# 13: NA 12 <NA> <NA> p a
# 14: NA 13 <NA> <NA> p a
(虽然我觉得我在某处缺少 data.table
-优雅。)
命名列表提供了一种依赖关系:names 代表您要测试 NA
值的列,它们中的每一个的内容 是在存在条件的情况下需要更新的列。
我试图将列 trtxxp
和 trtxxa
设置为 NA,而 sdx
是 NA。
虚拟数据:
library(data.table)
dt <- data.table(sd1 = c(1:3, NA, 4:5, NA, 6:10, NA, NA),
sd2 = c(1:5, NA, 6:7, NA, NA, 10:13),
trt01p = "p", trt01a = "a",
trt02p = "p", trt02a = "a")
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p a p a
2: 2 2 p a p a
3: 3 3 p a p a
4: NA 4 p a p a
5: 4 5 p a p a
6: 5 NA p a p a
7: NA 6 p a p a
8: 6 7 p a p a
9: 7 NA p a p a
10: 8 NA p a p a
11: 9 10 p a p a
12: 10 11 p a p a
13: NA 12 p a p a
14: NA 13 p a p a
我知道我可以通过以下几行来实现:
dt[is.na(sd1), `:=`(trt01p = NA,
trt01a = NA)]
dt[is.na(sd2), `:=`(trt02p = NA,
trt02a = NA)]
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p a p a
2: 2 2 p a p a
3: 3 3 p a p a
4: NA 4 <NA> <NA> p a
5: 4 5 p a p a
6: 5 NA p a <NA> <NA>
7: NA 6 <NA> <NA> p a
8: 6 7 p a p a
9: 7 NA p a <NA> <NA>
10: 8 NA p a <NA> <NA>
11: 9 10 p a p a
12: 10 11 p a p a
13: NA 12 <NA> <NA> p a
14: NA 13 <NA> <NA> p a
但我因为有很多列,所以我尝试使用 .SD
、lapply
和 .SDcols
但失败了(只有 trt01p
正确更新)
trt.col <- c("trt01p", "trt01a", "trt02a", "trt02p")
sd.col <- c("sd1", "sd2")
dt[, (trt.col) := lapply(.SD, function(x) ifelse(is.na(x), NA, get(trt.col))),
.SDcols = sort(c(sd.col, sd.col))][]
dt
sd1 sd2 trt01p trt01a trt02p trt02a
1: 1 1 p p p p
2: 2 2 p p p p
3: 3 3 p p p p
4: NA 4 <NA> <NA> p p
5: 4 5 p p p p
6: 5 NA p p <NA> <NA>
7: NA 6 <NA> <NA> p p
8: 6 7 p p p p
9: 7 NA p p <NA> <NA>
10: 8 NA p p <NA> <NA>
11: 9 10 p p p p
12: 10 11 p p p p
13: NA 12 <NA> <NA> p p
14: NA 13 <NA> <NA> p p
关于如何做到这一点有什么建议吗? 谢谢。
我认为 MichaelChirico 对 for
循环的建议可能如下所示:
cols <- list(sd1=c("trt01p", "trt01a"), sd2=c("trt02a", "trt02p"))
for (col in names(cols)) set(dt, which(is.na(dt[[col]])), cols[[col]], value = NA)
dt
# sd1 sd2 trt01p trt01a trt02p trt02a
# <int> <int> <char> <char> <char> <char>
# 1: 1 1 p a p a
# 2: 2 2 p a p a
# 3: 3 3 p a p a
# 4: NA 4 <NA> <NA> p a
# 5: 4 5 p a p a
# 6: 5 NA p a <NA> <NA>
# 7: NA 6 <NA> <NA> p a
# 8: 6 7 p a p a
# 9: 7 NA p a <NA> <NA>
# 10: 8 NA p a <NA> <NA>
# 11: 9 10 p a p a
# 12: 10 11 p a p a
# 13: NA 12 <NA> <NA> p a
# 14: NA 13 <NA> <NA> p a
(虽然我觉得我在某处缺少 data.table
-优雅。)
命名列表提供了一种依赖关系:names 代表您要测试 NA
值的列,它们中的每一个的内容 是在存在条件的情况下需要更新的列。