使用 geom_col 计算类别内的百分比

Question

以前曾以各种形式提出过这个问题，但我正在以稍微不同的方式尝试，似乎无法完全正确。当我使用此代码时：

d %>% 
  drop_na(attend) %>% 
  count(race, attend) %>% 
  group_by(race) %>%
  mutate(percent = n/sum(n)*100) %>% 
  ggplot(aes(race, percent, fill = race)) +
  geom_col(position = "dodge")

我得到这个数字：

'attend' 变量只是 0 和 1，我想显示每场比赛中 1 的 百分比。我认为图表中显示的那些线实际上是正确的，但其余这些列是怎么回事？我不太明白最后一步。

Answer 1

为了获得您想要的结果，请在计算百分比后针对 attend == 1 值过滤您的数据。

注意：黑色线条的出现是因为过度绘制，即当您设置 position = "dodge" 时，attend=0 和 attend=1 的条形图被绘制在彼此之上。

使用一些随机示例数据：

library(tidyr)
library(dplyr)
library(ggplot2)

set.seed(123)

d <- data.frame(
  race = sample(c("Asian", "White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
  attend = sample(0:1, 100, replace = TRUE)
)

d %>% 
  drop_na(attend) %>% 
  count(race, attend) %>% 
  group_by(race) %>%
  mutate(percent = n/sum(n)*100) %>% 
  filter(attend == 1) %>%
  ggplot(aes(reorder(race, percent), percent, fill = race)) +
  geom_col()

使用 geom_col 计算类别内的百分比

Calculating percentages within category using geom_col

r

ggplot2

geom-col