嵌套具有不同分组变量的多个 dplyr::summarise
Nest multiple dplyr::summarise with different grouping variables
我有一个包含 100 条记录的数据框,包括 bmi class(高于或低于 30)、腰围 class(高于或低于阈值)和结果变量(死亡 0 或 1)。
set.seed(1)
data <-
tibble(bmiclass=sample(x=c(0,1), size=100, replace = TRUE),
wcclass=sample(x=c(0,1), size=100, replace = TRUE),
deceased=sample(x=c(0,1), size=100, replace = TRUE))
我需要在同一个 table 中获取两个信息:1) 按 BMI 组划分的较高 WC class 受试者的百分比和 2) 按 BMI 组划分的死亡风险和厕所class。
我设法通过 left_join 函数加入两个 dplyr::group_by 和 dplyr::summarise 来做到这一点,如下所示:
data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
但是我想知道是否有更直接的方法可以在没有 left_join 的情况下更简单地完成它?
这将等效地做同样的事情
data %>%
group_by(bmiclass) %>%
mutate(risk.wc = sum(wcclass)/n()*100) %>%
group_by(bmiclass, wcclass, risk.wc) %>% summarise(risk.death=sum(deceased)/n()*100)
# A tibble: 4 x 4
# Groups: bmiclass, wcclass [4]
bmiclass wcclass risk.wc risk.death
<dbl> <dbl> <dbl> <dbl>
1 0 0 49.0 52
2 0 1 49.0 50
3 1 0 45.1 64.3
4 1 1 45.1 56.5
用你的代码检查一下
> data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
+ left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
`summarise()` has grouped output by 'bmiclass'. You can override using the `.groups` argument.
# A tibble: 4 x 4
# Groups: bmiclass [2]
bmiclass wcclass risk.death risk.wc
<dbl> <dbl> <dbl> <dbl>
1 0 0 52 49.0
2 0 1 50 49.0
3 1 0 64.3 45.1
4 1 1 56.5 45.1
无需执行联接,您可以执行以下操作:
library(dplyr)
data %>%
group_by(bmiclass, wcclass) %>%
summarise(risk.death = mean(deceased*100),
risk.wc = n()) %>%
mutate(risk.wc = mean(rep(wcclass, risk.wc)) * 100) %>%
ungroup
# bmiclass wcclass risk.death risk.wc
# <dbl> <dbl> <dbl> <dbl>
#1 0 0 52 49.0
#2 0 1 50 49.0
#3 1 0 64.3 45.1
#4 1 1 56.5 45.1
我有一个包含 100 条记录的数据框,包括 bmi class(高于或低于 30)、腰围 class(高于或低于阈值)和结果变量(死亡 0 或 1)。
set.seed(1)
data <-
tibble(bmiclass=sample(x=c(0,1), size=100, replace = TRUE),
wcclass=sample(x=c(0,1), size=100, replace = TRUE),
deceased=sample(x=c(0,1), size=100, replace = TRUE))
我需要在同一个 table 中获取两个信息:1) 按 BMI 组划分的较高 WC class 受试者的百分比和 2) 按 BMI 组划分的死亡风险和厕所class。 我设法通过 left_join 函数加入两个 dplyr::group_by 和 dplyr::summarise 来做到这一点,如下所示:
data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
但是我想知道是否有更直接的方法可以在没有 left_join 的情况下更简单地完成它?
这将等效地做同样的事情
data %>%
group_by(bmiclass) %>%
mutate(risk.wc = sum(wcclass)/n()*100) %>%
group_by(bmiclass, wcclass, risk.wc) %>% summarise(risk.death=sum(deceased)/n()*100)
# A tibble: 4 x 4
# Groups: bmiclass, wcclass [4]
bmiclass wcclass risk.wc risk.death
<dbl> <dbl> <dbl> <dbl>
1 0 0 49.0 52
2 0 1 49.0 50
3 1 0 45.1 64.3
4 1 1 45.1 56.5
用你的代码检查一下
> data %>% group_by(bmiclass, wcclass) %>% dplyr::summarise(risk.death=sum(deceased)/n()*100) %>%
+ left_join(data %>% group_by(bmiclass) %>% dplyr::summarise(risk.wc=sum(wcclass)/n()*100), by="bmiclass")
`summarise()` has grouped output by 'bmiclass'. You can override using the `.groups` argument.
# A tibble: 4 x 4
# Groups: bmiclass [2]
bmiclass wcclass risk.death risk.wc
<dbl> <dbl> <dbl> <dbl>
1 0 0 52 49.0
2 0 1 50 49.0
3 1 0 64.3 45.1
4 1 1 56.5 45.1
无需执行联接,您可以执行以下操作:
library(dplyr)
data %>%
group_by(bmiclass, wcclass) %>%
summarise(risk.death = mean(deceased*100),
risk.wc = n()) %>%
mutate(risk.wc = mean(rep(wcclass, risk.wc)) * 100) %>%
ungroup
# bmiclass wcclass risk.death risk.wc
# <dbl> <dbl> <dbl> <dbl>
#1 0 0 52 49.0
#2 0 1 50 49.0
#3 1 0 64.3 45.1
#4 1 1 56.5 45.1