ggplot：为什么 y 尺度大于每个响应的实际值？

Question

可能是个愚蠢的问题，但我似乎找不到解决方案：我试图在 x 轴（3 组）上绘制一个分类变量，在 y 轴上绘制一个连续变量（0 - 100 的百分比） -轴。当我这样做时，我必须澄清 geom_bar 是 stat = "identity" 或使用 geom_col.

但是，即使按照 and from Why is the value of y bar larger than the actual range of y in stacked bar plot? 的评论，这些值仍然显示在 y 轴上的 4000。

图表是这样不断出现的：

我还仔细检查了 x 变量是一个因子，y 变量是数字。为什么这仍然是 4000 而不是 100，比如百分比？

编辑： y 值只是参与者的反应。我有一个大数据集 (N = 600)，y 值是每个参与者给出的 0-100 的百分比。因此，在每个组中（每组 N = 200），我有一个百分比值。我想根据他们给出的百分比直观地比较三组。

这是我用来绘制图表的代码。

df$group <- as.factor(df$group)
df$confid<- as.numeric(df$confid)

library(ggplot2)                
plot <-ggplot(df, aes(group, confid))+
  geom_col()+ 
  ylab("confid %") + 
  xlab("group")

Answer 1

您是否正在尝试绘制每个组中的均值百分比？否则，不清楚条形图如何轻松地表示您正在寻找的内容。您或许可以添加误差线来了解响应的分布。

假设您的数据如下所示：

set.seed(4)

df <- data.frame(group = factor(rep(1:3, each = 200)),
                 confid = sample(40, 600, TRUE))

使用您的绘图代码，我们得到与您非常相似的结果：

library(ggplot2)                
plot <-ggplot(df, aes(group, confid))+
  geom_col()+ 
  ylab("confid %") + 
  xlab("group")

plot

但是，如果我们使用 stat_summary，我们可以改为绘制每组的均值和标准误差：

ggplot(df, aes(group, confid)) +
  stat_summary(geom = "bar", fun = mean, width = 0.6, 
               fill = "deepskyblue", color = "gray50") +
  geom_errorbar(stat = "summary", width = 0.5) +
  geom_point(stat = "summary") +
  ylab("confid %") + 
  xlab("group")

ggplot：为什么 y 尺度大于每个响应的实际值？

ggplot: why is the y-scale larger than the actual values for each response?

r

ggplot2

geom-bar