data.table 不同列的引用更新

data.table update by reference for different columns

我试图将列 trtxxptrtxxa 设置为 NA,而 sdx 是 NA。

虚拟数据:

library(data.table)
dt <- data.table(sd1 = c(1:3, NA, 4:5, NA, 6:10, NA, NA),
                 sd2 = c(1:5, NA, 6:7, NA, NA, 10:13),
                 trt01p = "p", trt01a = "a",
                 trt02p = "p", trt02a = "a")
dt
    sd1 sd2 trt01p trt01a trt02p trt02a
 1:   1   1      p      a      p      a
 2:   2   2      p      a      p      a
 3:   3   3      p      a      p      a
 4:  NA   4      p      a      p      a
 5:   4   5      p      a      p      a
 6:   5  NA      p      a      p      a
 7:  NA   6      p      a      p      a
 8:   6   7      p      a      p      a
 9:   7  NA      p      a      p      a
10:   8  NA      p      a      p      a
11:   9  10      p      a      p      a
12:  10  11      p      a      p      a
13:  NA  12      p      a      p      a
14:  NA  13      p      a      p      a

我知道我可以通过以下几行来实现:

dt[is.na(sd1), `:=`(trt01p = NA,
                    trt01a = NA)]

dt[is.na(sd2), `:=`(trt02p = NA,
                    trt02a = NA)]
dt
    sd1 sd2 trt01p trt01a trt02p trt02a
 1:   1   1      p      a      p      a
 2:   2   2      p      a      p      a
 3:   3   3      p      a      p      a
 4:  NA   4   <NA>   <NA>      p      a
 5:   4   5      p      a      p      a
 6:   5  NA      p      a   <NA>   <NA>
 7:  NA   6   <NA>   <NA>      p      a
 8:   6   7      p      a      p      a
 9:   7  NA      p      a   <NA>   <NA>
10:   8  NA      p      a   <NA>   <NA>
11:   9  10      p      a      p      a
12:  10  11      p      a      p      a
13:  NA  12   <NA>   <NA>      p      a
14:  NA  13   <NA>   <NA>      p      a

但我因为有很多列,所以我尝试使用 .SDlapply.SDcols 但失败了(只有 trt01p 正确更新)

trt.col <- c("trt01p", "trt01a", "trt02a", "trt02p")
sd.col <- c("sd1", "sd2")

dt[, (trt.col) := lapply(.SD, function(x) ifelse(is.na(x), NA, get(trt.col))),
     .SDcols = sort(c(sd.col, sd.col))][]
dt
    sd1 sd2 trt01p trt01a trt02p trt02a
 1:   1   1      p      p      p      p
 2:   2   2      p      p      p      p
 3:   3   3      p      p      p      p
 4:  NA   4   <NA>   <NA>      p      p
 5:   4   5      p      p      p      p
 6:   5  NA      p      p   <NA>   <NA>
 7:  NA   6   <NA>   <NA>      p      p
 8:   6   7      p      p      p      p
 9:   7  NA      p      p   <NA>   <NA>
10:   8  NA      p      p   <NA>   <NA>
11:   9  10      p      p      p      p
12:  10  11      p      p      p      p
13:  NA  12   <NA>   <NA>      p      p
14:  NA  13   <NA>   <NA>      p      p

关于如何做到这一点有什么建议吗? 谢谢。

我认为 MichaelChirico 对 for 循环的建议可能如下所示:

cols <- list(sd1=c("trt01p", "trt01a"), sd2=c("trt02a", "trt02p"))
for (col in names(cols)) set(dt, which(is.na(dt[[col]])), cols[[col]], value = NA)
dt
#       sd1   sd2 trt01p trt01a trt02p trt02a
#     <int> <int> <char> <char> <char> <char>
#  1:     1     1      p      a      p      a
#  2:     2     2      p      a      p      a
#  3:     3     3      p      a      p      a
#  4:    NA     4   <NA>   <NA>      p      a
#  5:     4     5      p      a      p      a
#  6:     5    NA      p      a   <NA>   <NA>
#  7:    NA     6   <NA>   <NA>      p      a
#  8:     6     7      p      a      p      a
#  9:     7    NA      p      a   <NA>   <NA>
# 10:     8    NA      p      a   <NA>   <NA>
# 11:     9    10      p      a      p      a
# 12:    10    11      p      a      p      a
# 13:    NA    12   <NA>   <NA>      p      a
# 14:    NA    13   <NA>   <NA>      p      a

(虽然我觉得我在某处缺少 data.table-优雅。)

命名列表提供了一种依赖关系:names 代表您要测试 NA 值的列,它们中的每一个的内容 是在存在条件的情况下需要更新的列。