在 ggplot 2 中为两个离散变量创建百分比标签

Create percentage labels for two discrete variables in ggplot 2

下面是一些示例数据:

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- c(gender, outcome)

我想创建一个 ggplot,其中 y 轴是百分比,x 轴是性别,填充是结果。它必须是一个带有百分比标签的堆积条。

在这里试过这段代码:

ggplot (df, aes (x = gender, fill = outcome)) + geom_bar()

但这给了我 y 轴上的计数。我希望在 y 轴上创建百分比。堆叠的女性条形图必须表明具有“女性组中响应者和非响应者结果”的女性百分比,而不是响应或不响应的女性占总人口的百分比。例如,我希望看到 40% 的女性响应者与 60% 的无响应者以及类似的男性响应者。

为了准备发布,我还需要在堆积条中添加这些百分比的标签。

这里是标签:

library(ggplot2)
gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1),  labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

ggplot(df, aes(x= gender)) + 
  geom_bar(aes(y = 2*(..count..)/sum(..count..), fill = outcome, group=outcome), stat="count") +
  geom_label(aes(label = scales::percent(2*(..count..)/sum(..count..)),
                  group = outcome), position = "fill", stat= "count", vjust = 0) +
  labs(y = "Percent", fill="outcome") +
  scale_y_continuous(labels = scales::percent)

@Paul 似乎有更好的方法 geom_bar

编辑

这是一个通用的解决方案:

library(ggplot2)
gender <- c("female", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1),  labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

gg <- ggplot() + 
  geom_bar(aes(x= gender, fill = outcome), data = df, position = "fill")
ggb <- ggplot_build(gg)
df2 <- data.frame(y = ggb$data[[1]][["y"]])

gg + geom_label(
  aes(x = rep(c(1,2), each = 2), label = scales::percent(y), y = y), 
  data = df2
)

不必更改数据的技巧是使用 geom_bar(position = "fill"),如此处所述:。 要格式化 y 轴的标签,您有多种选择。这是其中两个:

  • 使用 scalesscales::percent_format()
  • 改用自定义函数,只需将上面的代码替换为function(x) paste0(x*100, "%")

这里是:

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

library(ggplot2)
ggplot(data = df, aes(x = gender, fill = outcome)) +
  geom_bar(position="fill") +
  scale_y_continuous(labels = function(x) paste0(x*100, "%"))

reprex package (v2.0.0)

于 2021-08-19 创建

设法找到了 Paul 和 Stéphane 发布的答案的替代工作答案(他们也都很棒)。这种方法的优点是通用,在创建很多plot的时候可以节省时间。

library(dplyr)
library(ggplot2)

gender <- c("male", "female", "male", "male", "female", "female", "male", "female", "female", "male")
outcome <- factor(c(0,0,0,1,1,1,0,1,1,1), levels = c(0,1), labels = c("responders", "non-responders"))      
df <- data.frame(gender, outcome)

df %>%
  group_by(gender, outcome) %>% 
  summarise(count = n()) %>% 
  mutate(pct = round(count/sum(count), 2)) %>%
ggplot(aes(x = factor(gender), y = pct, fill = factor(outcome))) +
  geom_bar(stat="identity", width = 0.7) + scale_y_continuous(labels = scales::percent_format()) +
  labs(x = "Sex", y = "Percentage", fill = "Outcome") +
  theme_minimal(base_size = 14) +
  geom_text(aes(label=paste0(pct*100, "%")), vjust=-0.25, position=position_stack(0.5))

这是输出