名称中具有特定模式的列的行总和
Row sums over columns with a certain pattern in their name
我有一个data.table这样的
dput(DT)
structure(list(ref = c(3L, 3L, 3L, 3L), nb = 12:15, i1 = c(3.1e-05,
0.044495, 0.82244, 0.322291), i2 = c(0.000183, 0.155732, 0.873416,
0.648545), i3 = c(0.000824, 0.533939, 0.838542, 0.990648), i4 = c(0.044495,
0.82244, 0.322291, 0.393595)), .Names = c("ref", "nb", "i1",
"i2", "i3", "i4"), row.names = c(NA, -4L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000320788>)
DT
# ref nb i1 i2 i3 i4
# 1: 3 12 0.000031 0.000183 0.000824 0.044495
# 2: 3 13 0.044495 0.155732 0.533939 0.822440
# 3: 3 14 0.822440 0.873416 0.838542 0.322291
# 4: 3 15 0.322291 0.648545 0.990648 0.393595
现在我想计算行总和,但只包括以“i”开头的列(“i1”、“i2”等)
我已经使用 grep
创建了一个要求和的列名向量:
listCol <- colnames(DT)[grep("i", colnames(DT))]
listCol
# [1] "i1" "i2" "i3" "i4"
然后我尝试遍历列:
DT$sum <- rep.int(0, nrow(DT))
for (i in listCol){
DT$sum = DT$sum + DT[ , get(i)]
}
...给出所需的输出:
DT
# ref nb i1 i2 i3 i4 sum
# 1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
# 2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
# 3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
# 4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
如何改进我的代码?
子问题:
这个子问题包括对上一个子问题的部分回答:
如何避免这种奇怪的符号:
myrowMeans = function (x){
rowMeans(x, na.rm = TRUE)
}
DT[ , var := myrowMeans(.SD-myrowMeans(.SD)^2), .SDcols = grep("i", colnames(DT))]
使用.SDcols
指定列,然后取rowSums
。使用 :=
分配新列:
DT[ ,sum := rowSums(.SD), .SDcols = grep("i", names(DT))]
您也可以尝试 Reduce
DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
# ref nb i1 i2 i3 i4 Sum
#1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
#2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
#3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
#4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
注意:如果有"NA"个值,应该在Reduce
前用'0'代替,即
DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x,
which(is.na(x)), 0))), .SDcols=listCol][]
**另一个解决方案:**使用rowSums
DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))]
dplyr
解决方案是将 mutate_
与 paste(listCol, collapse = "+")
一起使用。但我想 Reduce
解决方案更快。
DT <- mutate_(DT, sum = paste(listCol, collapse = "+"))
我有一个data.table这样的
dput(DT)
structure(list(ref = c(3L, 3L, 3L, 3L), nb = 12:15, i1 = c(3.1e-05,
0.044495, 0.82244, 0.322291), i2 = c(0.000183, 0.155732, 0.873416,
0.648545), i3 = c(0.000824, 0.533939, 0.838542, 0.990648), i4 = c(0.044495,
0.82244, 0.322291, 0.393595)), .Names = c("ref", "nb", "i1",
"i2", "i3", "i4"), row.names = c(NA, -4L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000000000320788>)
DT
# ref nb i1 i2 i3 i4
# 1: 3 12 0.000031 0.000183 0.000824 0.044495
# 2: 3 13 0.044495 0.155732 0.533939 0.822440
# 3: 3 14 0.822440 0.873416 0.838542 0.322291
# 4: 3 15 0.322291 0.648545 0.990648 0.393595
现在我想计算行总和,但只包括以“i”开头的列(“i1”、“i2”等)
我已经使用 grep
创建了一个要求和的列名向量:
listCol <- colnames(DT)[grep("i", colnames(DT))]
listCol
# [1] "i1" "i2" "i3" "i4"
然后我尝试遍历列:
DT$sum <- rep.int(0, nrow(DT))
for (i in listCol){
DT$sum = DT$sum + DT[ , get(i)]
}
...给出所需的输出:
DT
# ref nb i1 i2 i3 i4 sum
# 1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
# 2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
# 3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
# 4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
如何改进我的代码?
子问题:
这个子问题包括对上一个子问题的部分回答:
如何避免这种奇怪的符号:
myrowMeans = function (x){
rowMeans(x, na.rm = TRUE)
}
DT[ , var := myrowMeans(.SD-myrowMeans(.SD)^2), .SDcols = grep("i", colnames(DT))]
使用.SDcols
指定列,然后取rowSums
。使用 :=
分配新列:
DT[ ,sum := rowSums(.SD), .SDcols = grep("i", names(DT))]
您也可以尝试 Reduce
DT[, Sum := Reduce(`+`, .SD), .SDcols=listCol][]
# ref nb i1 i2 i3 i4 Sum
#1: 3 12 0.000031 0.000183 0.000824 0.044495 0.045533
#2: 3 13 0.044495 0.155732 0.533939 0.822440 1.556606
#3: 3 14 0.822440 0.873416 0.838542 0.322291 2.856689
#4: 3 15 0.322291 0.648545 0.990648 0.393595 2.355079
注意:如果有"NA"个值,应该在Reduce
前用'0'代替,即
DT[, Sum := Reduce(`+`, lapply(.SD, function(x) replace(x,
which(is.na(x)), 0))), .SDcols=listCol][]
**另一个解决方案:**使用rowSums
DT[, Sum := rowSums(.SD, na.rm = TRUE), .SDcols = grep("i", names(DT))]
dplyr
解决方案是将 mutate_
与 paste(listCol, collapse = "+")
一起使用。但我想 Reduce
解决方案更快。
DT <- mutate_(DT, sum = paste(listCol, collapse = "+"))