dplyr summarize :在循环中按多个变量分组并将结果添加到同一数据框中
dplyr summarise : Group by multiple variables in a loop and add results in the same dataframe
我想计算几个变量的不同模式的指标,然后将这些结果添加到一个数据框中。我可以毫无问题地使用几个 summarise
加上 group_by
,然后执行 rbind
来收集结果。下面,我在 hdv2003 数据(来自 questionr
包)上执行此操作,并且我在变量 'sexe'、'trav.satisf' 和 'cuisine' 上创建了 rbind
结果。
library(questionr)
library(tidyverse)
data(hdv2003)
tmp_sexe <- hdv2003 %>%
group_by(sexe) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_sexe)[1] <- "group"
tmp_trav.satisf <- hdv2003 %>%
group_by(trav.satisf) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_trav.satisf)[1] <- "group"
tmp_cuisine <- hdv2003 %>%
group_by(cuisine) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_cuisine)[1] <- "group"
synthese <- rbind (tmp_sexe,
tmp_trav.satisf,
tmp_cuisine)
这是结果:
# A tibble: 8 x 5
group n percent femmes age
<fct> <int> <dbl> <dbl> <dbl>
1 Homme 899 45 0 48.2
2 Femme 1101 55 100 48.2
3 Satisfaction 480 24 51.5 41.4
4 Insatisfaction 117 5.9 47.9 40.3
5 Equilibre 451 22.6 49.9 40.9
6 NA 952 47.6 60.2 56
7 Non 1119 56 43.8 50.1
8 Oui 881 44 69.4 45.6
问题是这篇写的太长了,不好驾驭。所以我想用 for 循环产生相同的结果。但是我在 R 中的循环遇到了很多麻烦,我做不到。这是我的尝试:
groups <- c("sexe",
"trav.satisf",
"cuisine")
synthese <- tibble()
for (i in seq_along(groups)) {
tmp <- hdv2003 %>%
group_by(!!groups[i]) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp)[1] <- "group"
synthese <- bind_rows(synthese, tmp)
}
它有效,但没有产生预期的结果,我不明白为什么:
# A tibble: 3 x 5
group n percent femmes age
<chr> <int> <dbl> <dbl> <dbl>
1 sexe 2000 100 55 48.2
2 trav.satisf 2000 100 55 48.2
3 cuisine 2000 100 55 48.2
library(questionr)
library(tidyverse)
data(hdv2003)
list("trav.satisf", "cuisine", "sexe") %>%
map(~ {
hdv2003 %>%
group_by_at(.x) %>%
summarise(
n = n(),
percent = round((n() / nrow(hdv2003)) * 100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE) / sum(!is.na(sexe))) * 100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
) %>%
rename_at(1, ~"group") %>%
mutate(grouping = .x)
}) %>%
bind_rows() %>%
select(grouping, group, everything())
#> # A tibble: 8 x 6
#> grouping group n percent femmes age
#> <chr> <fct> <int> <dbl> <dbl> <dbl>
#> 1 trav.satisf Satisfaction 480 24 51.5 41.4
#> 2 trav.satisf Insatisfaction 117 5.9 47.9 40.3
#> 3 trav.satisf Equilibre 451 22.6 49.9 40.9
#> 4 trav.satisf <NA> 952 47.6 60.2 56
#> 5 cuisine Non 1119 56 43.8 50.1
#> 6 cuisine Oui 881 44 69.4 45.6
#> 7 sexe Homme 899 45 0 48.2
#> 8 sexe Femme 1101 55 100 48.2
由 reprex package (v2.0.1)
创建于 2021-11-12
我想计算几个变量的不同模式的指标,然后将这些结果添加到一个数据框中。我可以毫无问题地使用几个 summarise
加上 group_by
,然后执行 rbind
来收集结果。下面,我在 hdv2003 数据(来自 questionr
包)上执行此操作,并且我在变量 'sexe'、'trav.satisf' 和 'cuisine' 上创建了 rbind
结果。
library(questionr)
library(tidyverse)
data(hdv2003)
tmp_sexe <- hdv2003 %>%
group_by(sexe) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_sexe)[1] <- "group"
tmp_trav.satisf <- hdv2003 %>%
group_by(trav.satisf) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_trav.satisf)[1] <- "group"
tmp_cuisine <- hdv2003 %>%
group_by(cuisine) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp_cuisine)[1] <- "group"
synthese <- rbind (tmp_sexe,
tmp_trav.satisf,
tmp_cuisine)
这是结果:
# A tibble: 8 x 5
group n percent femmes age
<fct> <int> <dbl> <dbl> <dbl>
1 Homme 899 45 0 48.2
2 Femme 1101 55 100 48.2
3 Satisfaction 480 24 51.5 41.4
4 Insatisfaction 117 5.9 47.9 40.3
5 Equilibre 451 22.6 49.9 40.9
6 NA 952 47.6 60.2 56
7 Non 1119 56 43.8 50.1
8 Oui 881 44 69.4 45.6
问题是这篇写的太长了,不好驾驭。所以我想用 for 循环产生相同的结果。但是我在 R 中的循环遇到了很多麻烦,我做不到。这是我的尝试:
groups <- c("sexe",
"trav.satisf",
"cuisine")
synthese <- tibble()
for (i in seq_along(groups)) {
tmp <- hdv2003 %>%
group_by(!!groups[i]) %>%
summarise(n = n(),
percent = round((n()/nrow(hdv2003))*100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE)/sum(!is.na(sexe)))*100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
)
names(tmp)[1] <- "group"
synthese <- bind_rows(synthese, tmp)
}
它有效,但没有产生预期的结果,我不明白为什么:
# A tibble: 3 x 5
group n percent femmes age
<chr> <int> <dbl> <dbl> <dbl>
1 sexe 2000 100 55 48.2
2 trav.satisf 2000 100 55 48.2
3 cuisine 2000 100 55 48.2
library(questionr)
library(tidyverse)
data(hdv2003)
list("trav.satisf", "cuisine", "sexe") %>%
map(~ {
hdv2003 %>%
group_by_at(.x) %>%
summarise(
n = n(),
percent = round((n() / nrow(hdv2003)) * 100, digits = 1),
femmes = round((sum(sexe == "Femme", na.rm = TRUE) / sum(!is.na(sexe))) * 100, digits = 1),
age = round(mean(age, na.rm = TRUE), digits = 1)
) %>%
rename_at(1, ~"group") %>%
mutate(grouping = .x)
}) %>%
bind_rows() %>%
select(grouping, group, everything())
#> # A tibble: 8 x 6
#> grouping group n percent femmes age
#> <chr> <fct> <int> <dbl> <dbl> <dbl>
#> 1 trav.satisf Satisfaction 480 24 51.5 41.4
#> 2 trav.satisf Insatisfaction 117 5.9 47.9 40.3
#> 3 trav.satisf Equilibre 451 22.6 49.9 40.9
#> 4 trav.satisf <NA> 952 47.6 60.2 56
#> 5 cuisine Non 1119 56 43.8 50.1
#> 6 cuisine Oui 881 44 69.4 45.6
#> 7 sexe Homme 899 45 0 48.2
#> 8 sexe Femme 1101 55 100 48.2
由 reprex package (v2.0.1)
创建于 2021-11-12