dplyr:按阈值变量附加汇总行
dplyr: append summarise rows by threshold variable
约束: 使用 dplyr
或 tidyverse
库:
Objective: 我想使用阈值来总结数据。阈值有很多值,append/collate 这些汇总结果。
最小可重现示例:
df <- data.frame(colA=c(1,2,1,1),
colB=c(0,0,3,1),
colC=c(0,5,2,3),
colD=c(2,4,4,2))
> df
colA colB colC colD
1 1 0 0 2
2 2 0 1 2
3 1 3 2 2
4 1 1 3 2
当前:单阈值
df.ans <- df %>%
group_by(colA) %>%
summarize(theshold=1,
calcB = sum(df$colB[df$colB > theshold] - 1),
calcC = sum(df$colC[df$colC > theshold] - 1),
calcD = sum(df$colD[df$colD > theshold] - 1))
> df.ans
# A tibble: 2 x 5
colA theshold calcB calcC calcD
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 3 4
2 2 1 2 3 4
期望: 多个阈值
> df.ans
# A tibble: 6 x 5
colA theshold calcB calcC calcD
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 3 4
2 2 1 2 3 4
3 1 2 ....
4 2 2 ....
5 1 3 ....
6 2 3 ....
只需编写一个函数来进行阈值处理
thresh_fun <- function(df, threshold) {
df %>%
group_by(colA) %>%
summarize(threshold=threshold,
calcB = sum(colB[colB > threshold] - 1),
calcC = sum(colC[colC > threshold] - 1),
calcD = sum(colD[colD > threshold] - 1))
}
然后将其映射到每个值的 data.frame
# library(purrr) for map_df
map_df(1:3, ~thresh_fun(df, .))
# colA threshold calcB calcC calcD
# <dbl> <int> <dbl> <dbl> <dbl>
# 1 1 1 2 3 5
# 2 2 1 0 4 3
# 3 1 2 2 2 3
# 4 2 2 0 4 3
# 5 1 3 0 0 3
# 6 2 3 0 4 3
约束: 使用 dplyr
或 tidyverse
库:
Objective: 我想使用阈值来总结数据。阈值有很多值,append/collate 这些汇总结果。
最小可重现示例:
df <- data.frame(colA=c(1,2,1,1),
colB=c(0,0,3,1),
colC=c(0,5,2,3),
colD=c(2,4,4,2))
> df
colA colB colC colD
1 1 0 0 2
2 2 0 1 2
3 1 3 2 2
4 1 1 3 2
当前:单阈值
df.ans <- df %>%
group_by(colA) %>%
summarize(theshold=1,
calcB = sum(df$colB[df$colB > theshold] - 1),
calcC = sum(df$colC[df$colC > theshold] - 1),
calcD = sum(df$colD[df$colD > theshold] - 1))
> df.ans
# A tibble: 2 x 5
colA theshold calcB calcC calcD
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 3 4
2 2 1 2 3 4
期望: 多个阈值
> df.ans
# A tibble: 6 x 5
colA theshold calcB calcC calcD
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 2 3 4
2 2 1 2 3 4
3 1 2 ....
4 2 2 ....
5 1 3 ....
6 2 3 ....
只需编写一个函数来进行阈值处理
thresh_fun <- function(df, threshold) {
df %>%
group_by(colA) %>%
summarize(threshold=threshold,
calcB = sum(colB[colB > threshold] - 1),
calcC = sum(colC[colC > threshold] - 1),
calcD = sum(colD[colD > threshold] - 1))
}
然后将其映射到每个值的 data.frame
# library(purrr) for map_df
map_df(1:3, ~thresh_fun(df, .))
# colA threshold calcB calcC calcD
# <dbl> <int> <dbl> <dbl> <dbl>
# 1 1 1 2 3 5
# 2 2 1 0 4 3
# 3 1 2 2 2 3
# 4 2 2 0 4 3
# 5 1 3 0 0 3
# 6 2 3 0 4 3