R Data.Table 有权重
R Data.Table With Weights
library(data.table)
data = data.table("STUDENT" = c(1:100),
"SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))
dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
"LABEL3" = c(1,2,3,1,2,3),
"CAT_N" = NA,
"CAT_PER" = NA,
"FOX_N" = NA,
"FOX_PER" = NA,
"DOG_N" = NA,
"DOG_PER" = NA,
"MOUSE_N" = NA,
"MOUSE_PER" = NA,
"BIRD_N" = NA,
"BIRD_PER" = NA)
我有一个 data.table 称它为数据,我正在尝试尝试总结学生数据,就像 dataWANT 中显示的那样。
在 dataWANT 中,末尾带有 _N 的列只是列中值的计数,对于每个 LABEL1 和 LABEL3 组合,该值等于 1,因此总共有 6 个组。
在 dataWANT 中,末尾具有 _PER 的列是其列中具有 _PER 的组的加权比例。
使用 data.table
的一个选项是按 'LABEL1'、'LABEL3' 分组,在 .SDcols
中指定感兴趣的列,得到 sum
(因为它是二进制列)通过遍历 .SD
并根据 'SAMPLEWEIGHT' 列
与 weighted.mean
连接
library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")),
setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)),
paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]
library(data.table)
data = data.table("STUDENT" = c(1:100),
"SAMPLEWEIGHT" = sample(12:99, r = T, 100),
"LABEL1" = sample(1:2, r = T, 100),
"LABEL3" = sample(1:3, r = T, 100),
"CAT"=sample(0:1,r = T, 100),
"FOX"=sample(0:1,r = T, 100),
"DOG"=sample(0:1,r = T, 100),
"MOUSE"=sample(0:1,r = T, 100),
"BIRD"=sample(0:1,r = T, 100))
dataWANT = data.frame("LABEL1" = c(1,1,1,2,2,2),
"LABEL3" = c(1,2,3,1,2,3),
"CAT_N" = NA,
"CAT_PER" = NA,
"FOX_N" = NA,
"FOX_PER" = NA,
"DOG_N" = NA,
"DOG_PER" = NA,
"MOUSE_N" = NA,
"MOUSE_PER" = NA,
"BIRD_N" = NA,
"BIRD_PER" = NA)
我有一个 data.table 称它为数据,我正在尝试尝试总结学生数据,就像 dataWANT 中显示的那样。
在 dataWANT 中,末尾带有 _N 的列只是列中值的计数,对于每个 LABEL1 和 LABEL3 组合,该值等于 1,因此总共有 6 个组。
在 dataWANT 中,末尾具有 _PER 的列是其列中具有 _PER 的组的加权比例。
使用 data.table
的一个选项是按 'LABEL1'、'LABEL3' 分组,在 .SDcols
中指定感兴趣的列,得到 sum
(因为它是二进制列)通过遍历 .SD
并根据 'SAMPLEWEIGHT' 列
weighted.mean
连接
library(data.table)
data[, c(setNames(lapply(.SD, sum), paste0(names(.SD), "_N")),
setNames(lapply(.SD, function(x) weighted.mean(x == 1, SAMPLEWEIGHT)),
paste0(names(.SD), "_PER"))),.(LABEL1, LABEL3), .SDcols = CAT:BIRD]