使用ddply汇总时如何添加列
How to add column when summarising with ddply
我的问题是用 ddply
函数总结 data.frame,例如以下。
该函数用于创建具有最大评级和相应公司的新数据框。缺少的是第一个数据帧中的相应 ID。
我试图调用 ID 变量,但这导致出现错误消息。
我对最高评分对应的ID感兴趣
非常感谢您的提前帮助!
dat <- data.frame(ID = c("A11", "A12", "A21","A22","A23","A31"),
company = c("CompA","CompA","CompB","CompB","CompB","CompC"),
rating = c(1,4,2,5,3,4)
)
company ID ratingMax
1 CompA A11 1
2 CompA A12 4
3 CompB A21 2
4 CompB A22 5
5 CompB A23 3
6 CompC A31 4
library(plyr)
ddply(dat, "company", summarise, ratingMax = max(rating))
company ratingMax
1 CompA 4
2 CompB 5
3 CompC 4
ddply(dat, "company", summarise, ratingMax = max(rating), ID = ID)
Error: length(rows) == 1 is not TRUE
你可以试试
library(plyr)
ddply(dat, "company", summarise, ratingMax = max(rating),
ID = ID[which.max(rating)])
# company ratingMax ID
#1 CompA 4 A12
#2 CompB 5 A22
#3 CompC 4 A31
或使用dplyr
library(dplyr)
dat %>%
group_by(company) %>%
summarise(ratingMax=max(rating), ID=ID[which.max(rating)])
# company ratingMax ID
#1 CompA 4 A12
#2 CompB 5 A22
#3 CompC 4 A31
或者您可以使用 filter
dat %>%
group_by(company) %>%
filter(row_number() %in% which.max(rating))
或者按照@docendo discimus
的建议使用slice
(这样会更快更紧凑)
dat %>%
group_by(company) %>%
slice(which.max(rating))
这是一个快速 data.table
解决方案,它可以让您无需手动命名列(以防您要显示更多列)
library(data.table)
setDT(dat)[, .SD[which.max(rating)], by = company]
# company ID rating
# 1: CompA A12 4
# 2: CompB A22 5
# 3: CompC A31 4
我的问题是用 ddply
函数总结 data.frame,例如以下。
该函数用于创建具有最大评级和相应公司的新数据框。缺少的是第一个数据帧中的相应 ID。
我试图调用 ID 变量,但这导致出现错误消息。 我对最高评分对应的ID感兴趣
非常感谢您的提前帮助!
dat <- data.frame(ID = c("A11", "A12", "A21","A22","A23","A31"),
company = c("CompA","CompA","CompB","CompB","CompB","CompC"),
rating = c(1,4,2,5,3,4)
)
company ID ratingMax
1 CompA A11 1
2 CompA A12 4
3 CompB A21 2
4 CompB A22 5
5 CompB A23 3
6 CompC A31 4
library(plyr)
ddply(dat, "company", summarise, ratingMax = max(rating))
company ratingMax
1 CompA 4
2 CompB 5
3 CompC 4
ddply(dat, "company", summarise, ratingMax = max(rating), ID = ID)
Error: length(rows) == 1 is not TRUE
你可以试试
library(plyr)
ddply(dat, "company", summarise, ratingMax = max(rating),
ID = ID[which.max(rating)])
# company ratingMax ID
#1 CompA 4 A12
#2 CompB 5 A22
#3 CompC 4 A31
或使用dplyr
library(dplyr)
dat %>%
group_by(company) %>%
summarise(ratingMax=max(rating), ID=ID[which.max(rating)])
# company ratingMax ID
#1 CompA 4 A12
#2 CompB 5 A22
#3 CompC 4 A31
或者您可以使用 filter
dat %>%
group_by(company) %>%
filter(row_number() %in% which.max(rating))
或者按照@docendo discimus
的建议使用slice
(这样会更快更紧凑)
dat %>%
group_by(company) %>%
slice(which.max(rating))
这是一个快速 data.table
解决方案,它可以让您无需手动命名列(以防您要显示更多列)
library(data.table)
setDT(dat)[, .SD[which.max(rating)], by = company]
# company ID rating
# 1: CompA A12 4
# 2: CompB A22 5
# 3: CompC A31 4