覆盖 KDE 并使用 ggplot2 (R) 填充直方图

Question

我是 R 的新手，我正在努力叠加一个分为 6 classes 的填充直方图和基于整个分布的 KDE（不是 6 [=26= 的各个分布） ]）。我有这个包含 4 列（data1、data2、data3、origin）的数据集，所有数据都是连续的，来源是我的类别（地理位置）。我可以用 6 个 classes 绘制 data1 的直方图，但是当我添加 KDE 曲线时，它也分为 6 条曲线（每个 class 一条）。我想我知道我必须覆盖第一个 aes 参数并在调用 geom_density 时创建一个新参数，但我找不到如何这样做。

用 iris 数据集翻译我的问题，我想要 Sepal.Length 的 KDE 曲线，而不是每个物种的一个 KDE 曲线 Sepal.Length。这是我的代码和虹膜数据的结果。

ggplot(data=iris, aes(x=Sepal.Length, fill=Species)) +
    geom_histogram() +
    theme_minimal() +
    geom_density(kernel="gaussian", bw= 0.1, alpha=.3)

Answer 1

问题是直方图显示的是计数，其积分为总和，而密度图显示的是密度，积分为 1。要使两者兼容，您必须使用 'computed variables' 层的统计部分，可通过 after_stat() 访问。您可以缩放密度以使其积分为总和，也可以缩放直方图以使其积分为 1。

将直方图缩放到密度：

library(ggplot2)
ggplot(iris, aes(Sepal.Length, fill = Species)) +
  geom_histogram(aes(y = after_stat(density)),
                 position = 'identity') +
  geom_density(bw = 0.1, alpha = 0.3)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

将密度缩放到计数。要正确执行此操作，您应该将 count 计算变量与直方图的 binwidth 参数相乘。

ggplot(iris, aes(Sepal.Length, fill = Species)) +
  geom_histogram(binwidth = 0.2, position = 'identity') +
  geom_density(aes(y = after_stat(count * 0.2)),
               bw = 0.1, alpha = 0.3)

^{由 reprex package (v1.0.0)}

于 2021-06-22 创建

作为旁注；直方图的默认位置参数是将条形堆叠在彼此之上。设置 position = "identity" 可以防止这种情况。或者，您也可以在密度层中设置position = "stack"。

编辑：抱歉，我似乎忽略了问题的 'I want 1 KDE for the entire Sepal.Length' 部分。您必须手动设置组，如下所示：

ggplot(iris, aes(Sepal.Length, fill = Species)) +
  geom_histogram(binwidth = 0.2) +
  geom_density(bw = 0.1, alpha = 0.3, 
               aes(group = 1, y = after_stat(count * 0.2)))

Answer 2

我还找到了一个很好的教程，将 geom_hist() 和 geom_density() 与 sthda.com

上的匹配比例相结合

http://www.sthda.com/english/wiki/ggplot2-density-plot-quick-start-guide-r-software-and-data-visualization#combine-histogram-and-density-plots

那里的代表是：

set.seed(1234)
df <- data.frame(
  sex=factor(rep(c("F", "M"), each=200)),
  weight=round(c(rnorm(200, mean=55, sd=5),
                 rnorm(200, mean=65, sd=5)))
  ) 
library(ggplot2) 
ggplot(df, aes(x=weight, color=sex, fill=sex)) + 
 geom_histogram(aes(y=..density..), alpha=0.5,position="identity") +
 geom_density(alpha=.2)

覆盖 KDE 并使用 ggplot2 (R) 填充直方图

Overlay KDE and filled histogram with ggplot2 (R)

r

histogram

ggplot2

kernel-density