计算 R 中数据框中每个组的总和和最小最大值
Calculate sum and min max across each group in a data frame in R
我有以下示例数据框
df <- data.frame("Group"= c(1,1,2,2,2),"H" =
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) , "W2" = c(0,
95,95, 0,85),"W3" = c(85, 50,50 ,95,0))
需要计算两个额外的指标:
第一个指标:基于每个组和该组的 w1、w2、w3 的行,如果 w1、w2 和 w3 的值等于或大于 85,则输出为 100%。
例如:对于第 2 组,对于 w2 和 w3,最大值等于大于 85
对于 w1 ,它小于 85 所以结果是 66.7
第二个指标:该组的 w1、w2、w3 列中行的最小值和最大值。例如:对于第 2 组,min(max[0 0 50], max[95 0 85], max[50 95 0]) = 50
为了更清楚,这是所需的输出数据帧:
DesiredDf <- data.frame("Group"= c(1,1,2,2,2),"H" =
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) ,
"W2" = c(0, 95,95, 0,85), "W3" = c(85, 50,50 ,95,0),
"W" = c(100,100,66.7 ,66.7,66.7),MINMAX = c(85,85,50,50,50))
已经尝试了 for loop 和 sapply 方法,但实际数据集太大且执行太过slow.Looking无法在 R 中更无缝地计算这些指标。
data.table方式:
# use data.table
library(data.table)
setDT(df)
# aggregate data by group in order to calculate the 2 desired metrics
df1 <- df[ , .(maxw1 = max(W1), maxw2 = max(W2), maxw3 = max(W3)), by=Group]
# calculate the metrics
df1[ , metric1 := rowMeans(cbind(maxw1>=85, maxw2>=85, maxw3>=85))]
df1[ , metric2 := do.call(pmin,.SD), .SDcols = c("maxw1", "maxw2", "maxw3")]
# merge metrics back on to original dataframe
df <- merge(df, df1[ , .(Group, metric1, metric2)], by="Group")
通过使用 dplyr
:
df %>%
group_by(Group) %>%
mutate(w = rowMeans(cbind(max(W1) >= 85, max(W2) >= 85, max(W3) >= 85)),
minmax = min(max(W1), max(W2), max(W3)))
# A tibble: 5 x 7
# Groups: Group [2]
Group H W1 W2 W3 w minmax
<dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1. H1 95. 0. 85. 1.00 85.
2 1. H3 0. 95. 50. 1.00 85.
3 2. H3 0. 95. 50. 0.667 50.
4 2. H4 0. 0. 95. 0.667 50.
5 2. H2 50. 85. 0. 0.667 50.
我有以下示例数据框
df <- data.frame("Group"= c(1,1,2,2,2),"H" =
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) , "W2" = c(0,
95,95, 0,85),"W3" = c(85, 50,50 ,95,0))
需要计算两个额外的指标: 第一个指标:基于每个组和该组的 w1、w2、w3 的行,如果 w1、w2 和 w3 的值等于或大于 85,则输出为 100%。 例如:对于第 2 组,对于 w2 和 w3,最大值等于大于 85 对于 w1 ,它小于 85 所以结果是 66.7
第二个指标:该组的 w1、w2、w3 列中行的最小值和最大值。例如:对于第 2 组,min(max[0 0 50], max[95 0 85], max[50 95 0]) = 50
为了更清楚,这是所需的输出数据帧:
DesiredDf <- data.frame("Group"= c(1,1,2,2,2),"H" =
c("H1","H3","H3","H4","H2"), "W1" = c(95, 0, 0,0,50) ,
"W2" = c(0, 95,95, 0,85), "W3" = c(85, 50,50 ,95,0),
"W" = c(100,100,66.7 ,66.7,66.7),MINMAX = c(85,85,50,50,50))
已经尝试了 for loop 和 sapply 方法,但实际数据集太大且执行太过slow.Looking无法在 R 中更无缝地计算这些指标。
data.table方式:
# use data.table
library(data.table)
setDT(df)
# aggregate data by group in order to calculate the 2 desired metrics
df1 <- df[ , .(maxw1 = max(W1), maxw2 = max(W2), maxw3 = max(W3)), by=Group]
# calculate the metrics
df1[ , metric1 := rowMeans(cbind(maxw1>=85, maxw2>=85, maxw3>=85))]
df1[ , metric2 := do.call(pmin,.SD), .SDcols = c("maxw1", "maxw2", "maxw3")]
# merge metrics back on to original dataframe
df <- merge(df, df1[ , .(Group, metric1, metric2)], by="Group")
通过使用 dplyr
:
df %>%
group_by(Group) %>%
mutate(w = rowMeans(cbind(max(W1) >= 85, max(W2) >= 85, max(W3) >= 85)),
minmax = min(max(W1), max(W2), max(W3)))
# A tibble: 5 x 7
# Groups: Group [2]
Group H W1 W2 W3 w minmax
<dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1. H1 95. 0. 85. 1.00 85.
2 1. H3 0. 95. 50. 1.00 85.
3 2. H3 0. 95. 50. 0.667 50.
4 2. H4 0. 0. 95. 0.667 50.
5 2. H2 50. 85. 0. 0.667 50.