在 ggplot2 中按组向直方图添加均值

Add means to histograms by group in ggplot2

我正在关注 this sourceggplot2 中按组绘制直方图。

示例数据如下所示:

 set.seed(3)
x1 <- rnorm(500)
x2 <- rnorm(500, mean = 3)
x <- c(x1, x2)
group <- c(rep("G1", 500), rep("G2", 500))

df <- data.frame(x, group = group)

代码:

# install.packages("ggplot2")
library(ggplot2)

# Histogram by group in ggplot2
ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity")

我知道添加一行:

  +geom_vline(aes(xintercept=mean(group),color=group,fill=group), col = "red")

应该让我得到我正在寻找的东西,但我得到的只是一个具有一个均值的直方图,而不是按组的均值:

你有什么建议吗?

我会计算数据帧的平均值:

library(ggplot2)
library(dplyr)

df %>% 
  group_by(group) %>% 
  mutate(mean_x = mean(x)) 

输出为:

# A tibble: 1,000 × 3
# Groups:   group [2]
         x group mean_x
     <dbl> <chr>  <dbl>
 1 -0.962  G1    0.0525
 2 -0.293  G1    0.0525
 3  0.259  G1    0.0525
 4 -1.15   G1    0.0525
 5  0.196  G1    0.0525
 6  0.0301 G1    0.0525
 7  0.0854 G1    0.0525
 8  1.12   G1    0.0525
 9 -1.22   G1    0.0525
10  1.27   G1    0.0525
# … with 990 more rows

也一样:

library(ggplot2)
library(dplyr)
df %>% 
  group_by(group) %>% 
  mutate(mean_x = mean(x)) %>% 
  ggplot(aes(x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity") +
  geom_vline(aes(xintercept = mean_x), col = "red")

输出为:

除了前面的建议,你还可以使用单独存储的分组方式,i. e.两个而不是 nrow=1000 个高度冗余的值:

## a 'tidy' (of several valid ways for groupwise calculation):
group_means <- df %>%
  group_by(group) %>%
  summarise(group_means = mean(x, na.rm = TRUE)) %>%
  pull(group_means)

## ... ggplot code ... +
    geom_vline(xintercept = group_means)

没有预计算的直接方法是:

ggplot(df, aes(x = x, fill = group, colour = group)) + 
  geom_histogram(alpha = 0.5, position = "identity") +
  geom_vline(xintercept = tapply(df$x, df$group, mean), col = "red")