R循环到新数据帧摘要加权
R Loop To New Data Frame Summary Weighted
我有一个高大的数据框:
data = data.frame("id"=c(1,2,3,4,5,6,7,8,9,10),
"group"=c(1,1,2,1,2,2,2,2,1,2),
"type"=c(1,1,2,3,2,2,3,3,3,1),
"score1"=c(sample(1:4,10,r=T)),
"score2"=c(sample(1:4,10,r=T)),
"score3"=c(sample(1:4,10,r=T)),
"score4"=c(sample(1:4,10,r=T)),
"score5"=c(sample(1:4,10,r=T)),
"weight1"=c(173,109,136,189,186,146,173,102,178,174),
"weight2"=c(147,187,125,126,120,165,142,129,144,197),
"weight3"=c(103,192,102,159,128,179,195,193,135,145),
"weight4"=c(114,182,199,101,111,116,198,123,119,181),
"weight5"=c(159,125,104,171,166,154,197,124,180,154))
library(reshape2)
library(plyr)
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
data1 <- data1[order(data1$id), ]
我想要创建的是一个新的数据框,如下所示:
want = data.frame("score"=rep(1:4,6),
"group"=rep(1:2,12),
"type"=rep(1:3,8),
"weightedCOUNT"=NA) # how to calculate this? count(data1, score, wt = weight)
我只是不确定如何计算 weightedCOUNT,它应该将权重应用于分数变量,然后它在列 'weightedCOUNT' 中给出按分数、组和类型聚合的加权计数。
一个选项是 melt
(来自 data.table
- 可以取多个 measure
patterns
,然后按 'group'、'type' 得到 count
library(data.table)
library(dplyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
count(score, wt = weight)
如果我们需要complete
组组合
library(tidyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
ungroup %>%
complete(group, type, score, fill = list(n = 0))
如果我没理解错的话,weightedCOUNT
是按score
、group
和type
分组的权重之和。
为了完整起见,我想展示 在纯基础 R 和纯 data.table
语法中实现时的样子,resp.
基础 R
OP 快到了。他已经将 data
从宽格式重塑为多值变量的长格式。只缺少最后的聚合步骤:
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
result <- aggregate(weight ~ score + group + type, data1, FUN = sum)
result
score group type weight
1 1 1 1 479
2 3 1 1 558
3 4 1 1 454
4 1 2 1 378
5 2 2 1 154
6 3 2 1 174
7 4 2 1 145
8 1 2 2 535
9 2 2 2 855
10 3 2 2 248
11 4 2 2 499
12 1 1 3 189
13 2 1 3 351
14 3 1 3 600
15 4 1 3 362
16 1 2 3 596
17 2 2 3 265
18 3 2 3 193
19 4 2 3 522
result
可以通过
重新排序
with(result, result[order(score, group, type), ])
score group type weight
1 1 1 1 479
12 1 1 3 189
4 1 2 1 378
8 1 2 2 535
16 1 2 3 596
13 2 1 3 351
5 2 2 1 154
9 2 2 2 855
17 2 2 3 265
2 3 1 1 558
14 3 1 3 600
6 3 2 1 174
10 3 2 2 248
18 3 2 3 193
3 4 1 1 454
15 4 1 3 362
7 4 2 1 145
11 4 2 2 499
19 4 2 3 522
data.table
如 所示,data.table
包中的 melt()
可以与 dplyr
结合使用。或者,我们可以保留聚合的 data.table
语法:
library(data.table)
cols <- c("score", "weight") # to save typing
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)]
score group type weightedCOUNT
1: 1 1 1 479
2: 1 1 3 189
3: 1 2 1 378
4: 1 2 2 535
5: 1 2 3 596
6: 2 1 3 351
7: 2 2 1 154
8: 2 2 2 855
9: 2 2 3 265
10: 3 1 1 558
11: 3 1 3 600
12: 3 2 1 174
13: 3 2 2 248
14: 3 2 3 193
15: 4 1 1 454
16: 4 1 3 362
17: 4 2 1 145
18: 4 2 2 499
19: 4 2 3 522
keyby
参数用于一步对输出进行分组和排序。
在 data.table
语法中使用 cross join 函数 CJ()
:
也可以完成分组变量的缺失组合
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)][
CJ(score, group, type, unique = TRUE), on = .(score, group, type)][
is.na(weightedCOUNT), weightedCOUNT := 0][]
score group type weightedCOUNT
1: 1 1 1 479
2: 1 1 2 0
3: 1 1 3 189
4: 1 2 1 378
5: 1 2 2 535
6: 1 2 3 596
7: 2 1 1 0
8: 2 1 2 0
9: 2 1 3 351
10: 2 2 1 154
11: 2 2 2 855
12: 2 2 3 265
13: 3 1 1 558
14: 3 1 2 0
15: 3 1 3 600
16: 3 2 1 174
17: 3 2 2 248
18: 3 2 3 193
19: 4 1 1 454
20: 4 1 2 0
21: 4 1 3 362
22: 4 2 1 145
23: 4 2 2 499
24: 4 2 3 522
score group type weightedCOUNT
我有一个高大的数据框:
data = data.frame("id"=c(1,2,3,4,5,6,7,8,9,10),
"group"=c(1,1,2,1,2,2,2,2,1,2),
"type"=c(1,1,2,3,2,2,3,3,3,1),
"score1"=c(sample(1:4,10,r=T)),
"score2"=c(sample(1:4,10,r=T)),
"score3"=c(sample(1:4,10,r=T)),
"score4"=c(sample(1:4,10,r=T)),
"score5"=c(sample(1:4,10,r=T)),
"weight1"=c(173,109,136,189,186,146,173,102,178,174),
"weight2"=c(147,187,125,126,120,165,142,129,144,197),
"weight3"=c(103,192,102,159,128,179,195,193,135,145),
"weight4"=c(114,182,199,101,111,116,198,123,119,181),
"weight5"=c(159,125,104,171,166,154,197,124,180,154))
library(reshape2)
library(plyr)
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
data1 <- data1[order(data1$id), ]
我想要创建的是一个新的数据框,如下所示:
want = data.frame("score"=rep(1:4,6),
"group"=rep(1:2,12),
"type"=rep(1:3,8),
"weightedCOUNT"=NA) # how to calculate this? count(data1, score, wt = weight)
我只是不确定如何计算 weightedCOUNT,它应该将权重应用于分数变量,然后它在列 'weightedCOUNT' 中给出按分数、组和类型聚合的加权计数。
一个选项是 melt
(来自 data.table
- 可以取多个 measure
patterns
,然后按 'group'、'type' 得到 count
library(data.table)
library(dplyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
count(score, wt = weight)
如果我们需要complete
组组合
library(tidyr)
melt(setDT(data), measure = patterns('^score', "^weight"),
value.name = c("score", "weight")) %>%
group_by(group, type) %>%
ungroup %>%
complete(group, type, score, fill = list(n = 0))
如果我没理解错的话,weightedCOUNT
是按score
、group
和type
分组的权重之和。
为了完整起见,我想展示 data.table
语法中实现时的样子,resp.
基础 R
OP 快到了。他已经将 data
从宽格式重塑为多值变量的长格式。只缺少最后的聚合步骤:
data1 <- reshape(data, direction = "long",
varying = list(c(paste0("score",1:5)),c(paste0("weight",1:5))),
v.names = c("score","weight"),
idvar = "id", timevar = "count", times = c(1:5))
result <- aggregate(weight ~ score + group + type, data1, FUN = sum)
result
score group type weight 1 1 1 1 479 2 3 1 1 558 3 4 1 1 454 4 1 2 1 378 5 2 2 1 154 6 3 2 1 174 7 4 2 1 145 8 1 2 2 535 9 2 2 2 855 10 3 2 2 248 11 4 2 2 499 12 1 1 3 189 13 2 1 3 351 14 3 1 3 600 15 4 1 3 362 16 1 2 3 596 17 2 2 3 265 18 3 2 3 193 19 4 2 3 522
result
可以通过
with(result, result[order(score, group, type), ])
score group type weight 1 1 1 1 479 12 1 1 3 189 4 1 2 1 378 8 1 2 2 535 16 1 2 3 596 13 2 1 3 351 5 2 2 1 154 9 2 2 2 855 17 2 2 3 265 2 3 1 1 558 14 3 1 3 600 6 3 2 1 174 10 3 2 2 248 18 3 2 3 193 3 4 1 1 454 15 4 1 3 362 7 4 2 1 145 11 4 2 2 499 19 4 2 3 522
data.table
如 data.table
包中的 melt()
可以与 dplyr
结合使用。或者,我们可以保留聚合的 data.table
语法:
library(data.table)
cols <- c("score", "weight") # to save typing
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)]
score group type weightedCOUNT 1: 1 1 1 479 2: 1 1 3 189 3: 1 2 1 378 4: 1 2 2 535 5: 1 2 3 596 6: 2 1 3 351 7: 2 2 1 154 8: 2 2 2 855 9: 2 2 3 265 10: 3 1 1 558 11: 3 1 3 600 12: 3 2 1 174 13: 3 2 2 248 14: 3 2 3 193 15: 4 1 1 454 16: 4 1 3 362 17: 4 2 1 145 18: 4 2 2 499 19: 4 2 3 522
keyby
参数用于一步对输出进行分组和排序。
在 data.table
语法中使用 cross join 函数 CJ()
:
melt(setDT(data), measure = patterns(cols), value.name = cols)[
, .(weightedCOUNT = sum(weight)), keyby = .(score, group, type)][
CJ(score, group, type, unique = TRUE), on = .(score, group, type)][
is.na(weightedCOUNT), weightedCOUNT := 0][]
score group type weightedCOUNT 1: 1 1 1 479 2: 1 1 2 0 3: 1 1 3 189 4: 1 2 1 378 5: 1 2 2 535 6: 1 2 3 596 7: 2 1 1 0 8: 2 1 2 0 9: 2 1 3 351 10: 2 2 1 154 11: 2 2 2 855 12: 2 2 3 265 13: 3 1 1 558 14: 3 1 2 0 15: 3 1 3 600 16: 3 2 1 174 17: 3 2 2 248 18: 3 2 3 193 19: 4 1 1 454 20: 4 1 2 0 21: 4 1 3 362 22: 4 2 1 145 23: 4 2 2 499 24: 4 2 3 522 score group type weightedCOUNT