根据组为直方图着色时防止错误的密度图

Question

基于一些虚拟数据，我创建了一个带有密度图的直方图

set.seed(1234)
wdata = data.frame(
  sex = factor(rep(c("F", "M"), each=200)),
  weight = c(rnorm(200, 55), rnorm(200, 58))
)
a <- ggplot(wdata, aes(x = weight))

a + geom_histogram(aes(y = ..density..,
                       # color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               # aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

weight的直方图应该对应sex着色，所以我用aes(y = ..density.., color = sex)来表示geom_histogram():

a + geom_histogram(aes(y = ..density..,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               # aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

如我所愿，密度图保持不变（两组的总体），但直方图按比例放大（现在似乎单独处理）：

如何防止这种情况发生？我需要单独着色的直方图条，但需要所有着色组的联合密度图。

P.S。将 aes(color = sex) 用于 geom_density() 可将所有内容恢复到原始比例 - 但我不想要单独的密度图（如下所示）：

a + geom_histogram(aes(y = ..density..,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               aes(color = sex)
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

编辑：

正如所建议的那样，用 geom_histogram() 和 y = ..density../2 的美学中的组数除以近似解。然而，这仅适用于对称分布，如下面的第一个输出所示：

a + geom_histogram(aes(y = ..density../2,
                       color = sex
                       ), 
                   colour="black",
                   fill="white",
                   position = "identity") +
  geom_density(alpha = 0.2,
               ) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF"))

产生

但是，不太对称的分布可能会导致使用此方法出现问题。请参阅下面的内容，其中 5 个组使用了 y = ..density../5。先原创，后处理（position = "stack"）：

由于左侧偏重，除以 5 左侧低估，右侧高估。

编辑 2：解决方案

根据 Andrew 的建议，以下（完整）代码解决了问题：

library(ggplot2)
set.seed(1234)
wdata = data.frame(
  sex = factor(rep(c("F", "M"), each = 200)),
  weight = c(rnorm(200, 55), rnorm(200, 58))
)

binwidth <- 0.25
a <- ggplot(wdata,
            aes(x = weight,
                # Pass binwidth to aes() so it will be found in
                # geom_histogram()'s aes() later
                binwidth = binwidth))

# Basic plot w/o colouring according to 'sex'
a + geom_histogram(aes(y = ..density..),
                   binwidth = binwidth,
                   colour = "black",
                   fill = "white",
                   position = "stack") +
  geom_density(alpha = 0.2) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
  # Use fixed scale for sake of comparability
  scale_x_continuous(limits = c(52, 61)) +
  scale_y_continuous(limits = c(0, 0.25))


# Plot w/ colouring according to 'sex'
a + geom_histogram(aes(x = weight,
                       # binwidth will only be found if passed to
                       # ggplot()'s aes() (as above)
                       y = ..count.. / (sum(..count..) * binwidth),
                       color = sex),
                   binwidth = binwidth,
                   fill="white",
                   position = "stack") +
  geom_density(alpha = 0.2) +
  scale_color_manual(values = c("#868686FF", "#EFC000FF")) +
  # Use fixed scale for sake of comparability
  scale_x_continuous(limits = c(52, 61)) +
  scale_y_continuous(limits = c(0, 0.25)) +
  guides(color = FALSE)

注： binwidth = binwidth 需要传递给 ggplot() 的 aes()，否则 geom_histogram() 的 aes() 将找不到预先指定的 binwidth .此外，还指定了 position = "stack"，以便两个版本的直方图具有可比性。虚拟数据图和下面更复杂的分布：

已解决 - 感谢您的帮助！

Answer 1

我不认为你可以使用 y=..density.. 来做到这一点，但你可以像这样重新创建同样的东西...

binwidth <- 0.25 #easiest to set this manually so that you know what it is

a + geom_histogram(aes(y = ..count.. / (sum(..count..) * binwidth),
                       color = sex), 
                   binwidth = binwidth,
                   fill="white",
                   position = "identity") +
    geom_density(alpha = 0.2) +
    scale_color_manual(values = c("#868686FF", "#EFC000FF"))

根据组为直方图着色时防止错误的密度图

Preventing wrong density plots when coloring histograms according to groups

r

histogram

ggplot2

density-plot