将函数应用于具有名称模式的变量，每个 id 具有 NA 值

Question

我有一个数据 table，我想在其中计算每个 ID 以 "amount" 开头的变量组的平均值。

以 amount 开头的变量数量可能会有所不同，但在我的真实数据中它们远远超过 100（并且一些变量具有 NA 值）。

id  variable    amountA amountB amountC amountD
1   A   8   7   6   2
2   B   6   2   1   2
3   C   6   6   9   4
4   D   1   6   2   7

在我的数据中，我尝试过失败：

DT[,testvar := apply(DT[ ,grepl("amount",names(DT))],1,mean)]
DT[,testvar := mean(DT[ ,grepl("amount",names(DT))],na.rm=TRUE), by = idvar]

我正在尝试使用 .EACHI 来解决这个问题，但我还没有弄清楚。非常感谢任何想法或评论。

样本table：

structure(list(id = 1:4, variable = structure(1:4, .Label = c("A", 
"B", "C", "D"), class = "factor"), amountA = c(8L, 6L, 6L, 1L
), amountB = c(7L, 2L, 6L, 6L), amountC = c(6L, 1L, 9L, 2L), 
    amountD = c(2L, 2L, 4L, 7L)), .Names = c("id", "variable", 
"amountA", "amountB", "amountC", "amountD"), class = "data.frame", row.names = c(NA, 
-4L))

Answer 1

根据 Arun 的一些建议，这是一个可能的解决方案：

DT[, testvar:=rowMeans(.SD, na.rm=T), .SDcols=grep("^amount", names(DT), value=T)]

产生：

   id variable amountA amountB amountC amountD testvar
1:  1        A       8       7       6       2    5.75
2:  2        B       6       2       1       2    2.75
3:  3        C       6       6       9       4    6.25
4:  4        D       1       6       2       7    4.00

我们使用 .SDcols 和 grep 定义我们希望成为内部 .SD 对象的一部分的列，然后我们只是 rowSums 结果 .SD.

在 data.table 的较新版本中，您可以使用 .SDcols 中的 patterns 来简化此操作：

DT[, testvar := rowMeans(.SD, na.rm = TRUE), .SDcols = patterns('amount')]

将函数应用于具有名称模式的变量，每个 id 具有 NA 值

Apply a function to variables with a name pattern for each id with NA values

loops

r

data.table