dplyr 总结组不展平数据
dplyr summarise on group not flattening data
我有一个数据集:
df <- structure(list(ID = c(101188, 101192, 101193, 101196, 101198,
101202, 101203, 101206, 101211, 101212, 101216, 101219, 101220,
101222, 101223, 101224, 101226, 101227, 101228, 101229), LA = c("Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham"), EstablishmentGroup = c("Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))
如果我运行下面的代码,我希望最终的总结能够压平数据并告诉我
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n(),
per = total/All_schools)
Barking and Dagenham Local authority maintained schools 20 20 1
但它给了我 20 行。我可以使用不同的,但不确定我做错了什么。
可以先汇总计数,再变异计算百分比。
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n()) %>%
mutate(per = total/All_schools)
输出:
# A tibble: 1 x 5
# Groups: LA, EstablishmentGroup [1]
LA EstablishmentGroup All_schools total per
<chr> <chr> <int> <int> <dbl>
1 Barking and Dagenham Local authority maintained schools 20 20 1
我有一个数据集:
df <- structure(list(ID = c(101188, 101192, 101193, 101196, 101198,
101202, 101203, 101206, 101211, 101212, 101216, 101219, 101220,
101222, 101223, 101224, 101226, 101227, 101228, 101229), LA = c("Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham", "Barking and Dagenham", "Barking and Dagenham",
"Barking and Dagenham"), EstablishmentGroup = c("Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools", "Local authority maintained schools",
"Local authority maintained schools")), row.names = c(NA, -20L
), class = c("tbl_df", "tbl", "data.frame"))
如果我运行下面的代码,我希望最终的总结能够压平数据并告诉我
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n(),
per = total/All_schools)
Barking and Dagenham Local authority maintained schools 20 20 1
但它给了我 20 行。我可以使用不同的,但不确定我做错了什么。
可以先汇总计数,再变异计算百分比。
df %>%
group_by(LA) %>%
mutate(All_schools = n()) %>%
ungroup() %>%
group_by(LA, EstablishmentGroup, All_schools) %>%
summarise(total = n()) %>%
mutate(per = total/All_schools)
输出:
# A tibble: 1 x 5
# Groups: LA, EstablishmentGroup [1]
LA EstablishmentGroup All_schools total per
<chr> <chr> <int> <int> <dbl>
1 Barking and Dagenham Local authority maintained schools 20 20 1