ggplot 密度图产生意想不到的结果

Question

我有一个关于使用 ggplot 绘制密度图的问题。为了弄清楚我的问题，我创建了以下示例数据：

DT2 <- data.table(Rating = c(1:19),
            Nndef = c(50, 30, 70, 70, 60, 40, 60, 30, 30, 10,
                      5, 3, 1, 0, 0, 0, 0, 0, 0))

现在我想绘制每个评级类别的 Nndef 数量的密度图。在执行此操作之前，我按 Nndef 的数量复制每一行，以便每个评级类别出现 Nndef 次。

DT2 <- DT2[rep(1:.N,Nndef)]

现在这应该可以解决问题：

ggplot(DT2, aes(x =Rating))+ theme_bw() +
geom_density(aes(x=Rating))

这给了我

这实际上是我期望使用此数据发生的情况。但是，现在考虑一下

DT1 <- data.table(Rating = c(1:19),
            Nndef = c(460, 480, 1300, 2600, 5700, 4700, 9300, 10600, 7700, 8200,
                      6500, 6700, 5300, 4700, 2700, 1100, 1200, 400, 420))
DT1 <- DT1[rep(1:.N,Nndef)]
ggplot(DT1, aes(x =Rating))+ theme_bw() +
geom_density(aes(x=Rating))

这导致了这个

我熟悉 geom_density 中的 adjust 参数，但我运行在 for 循环中使用了很多这些 ggplots。我想获得平滑的密度图（就像第一个使用 DT2 的图一样）但不想自己手动调整每个图形。此外，我不明白为什么它在后一种情况下会产生扭曲的密度分布，而在前一种情况下会产生相当准确的密度分布。有什么想法吗？

提前致谢

Answer 1

您可以将调整因子限制为 'x' 的唯一值数量的一小部分：

ggplot(DT1, aes(x =Rating))+ theme_bw() +
      geom_density(aes(x=Rating), adjust=length(unique(x))/10)

ggplot 密度图产生意想不到的结果

ggplot density plot produce unexpected result

r

ggplot2

density-plot