同时使用宽度和闪避时 geom_bar 的位置问题

Question

我有以下数据框

group1 = c('a', 'b')
group2 = c('1', '1', '2', '2')
mean = 1:4
sd = c(0.2, 0.3, 0.5, 0.8)
df = data.frame(group1, group2, mean, sd)

我想用 geom_errorbar() 在图表上绘制 sd。这非常有效：

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
                position = 'dodge')

因为我想减小误差线的宽度，所以我运行:

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge')

到目前为止一切顺利。但后来我想按group2填写。

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge')

问题是错误条不再位于条的中间。我不知道为什么。我查看了文档，但没有找到有关此问题的任何信息。我看了这个问题 Force error bars to be in the middle of bar 而这个但没有人解释为什么会这样。一种建议的解决方案是添加 position_dodge(0.9)。

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = position_dodge(0.9))

它起作用了，但我不知道为什么以及它做了什么。有人可以解释发生了什么吗？为什么我不能只添加 width = 0.2 来减少错误栏的宽度？ position_dodge(0.9) 是做什么的？为什么我需要它？为什么只在添加fill = group2时出现问题？

Answer 1

TL;DR：从一开始，position = "dodge"（或position = position_dodge(<some width value>)）并没有按照您认为的那样去做。

潜在的直觉

position_dodge 是 ggplot2 包中可用的位置调整函数之一。如果有多个元素属于不同的组占据同一个位置，position_identity什么都不做，position_dodge会将元素水平并排放置，position_stack 会将它们垂直放置在另一个顶部，position_fill 会将它们垂直放置在另一个顶部并按比例拉伸以适合整个绘图区域，等等。

这里是不同位置调整函数行为的总结，来自RStudio's ggplot2 cheat sheet:

注意要闪避的元素/等必须属于不同的组。如果 group = <some variable> 在图中明确指定，它将用作分组变量，用于确定哪些元素应该相互躲避等。如果aes()中没有明确的组映射，但有color = <some variable>/fill = <some variable>/linetype = <some variable>/等中的一个或多个，则将使用所有离散变量的交互.来自 ?aes_group_order:

By default, the group is set to the interaction of all discrete variables in the plot. This often partitions the data correctly, but when it does not, or when no discrete variable is used in the plot, you will need to explicitly define the grouping structure, by mapping group to a variable that has a different value for each group.

按情节细分绘制

让我们从你原来的情节开始吧。由于情节的美学映射中没有任何类型的分组变量，position = "dodge" 什么也没做。

我们可以用position = "identity"替换两个geom层（事实上，position = "identity"是geom_errorbar的默认位置，所以不需要拼写），和结果图将是相同的。

增加透明度可以明显看出两个条占据同一个位置，一个 "behind" 另一个。

我想这个原始情节不是您真正想要的？像这样一个柱状图落后于另一个柱状图的情况真的很少...

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd),
                position = 'dodge') +
  ggtitle("original plot")

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = "identity") + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
  ggtitle("remove position dodge")

ggplot(data = df, aes(x=group1, y = mean))+
  geom_col(position = "identity", alpha = 0.5) + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd)) +
  ggtitle("increase transparency")

我将跳过第二个情节，因为添加 width = 0.2 并没有改变任何基本内容。

在第三个图中，我们终于把position = "dodge"用上了，因为现在有了组变量。条形图和误差条根据它们各自的宽度相应地移动。如果使用 position = "dodge" 而不是 position = position_dodge(width = <some value>, ...)，这就是 预期的 行为，其中闪避距离默认遵循 geom 层的宽度，除非它被特定值覆盖在 position_dodge(width = ...).

如果 geom_errorbar 图层保持其默认宽度（与 geom_col 的默认宽度相同），则两个图层的元素将被相同数量的闪避。

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.2,
                position = 'dodge') +
  ggtitle("third plot")

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = 'dodge') + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), 
                position = 'dodge') +
  ggtitle("with default width")

旁注：我们知道 geom_errorbar 和 geom_col 具有相同的默认宽度，因为它们以相同的方式设置数据。在GeomErrorbar$setup_data/GeomCol$setup_data中都可以找到下面这行代码：

data$width <- data$width %||% params$width %||% (resolution(data$x, FALSE) * 0.9)
# i.e. if width is specified as one of the aesthetic mappings, use that;
#      else if width is specified in the geom layer's parameters, use that;
#      else, use 90% of the dataset's x-axis variable's resolution.        <- default value of 0.9

总之，当你有不同的审美群体时，在position_dodge中指定宽度决定了每个元素移动的距离，而在每个geom层中指定的宽度决定了每个元素的……嗯，宽度。只要不同的 geom 图层躲避相同的量，它们就会彼此对齐。

下面是一个随机示例来说明，它为每个层使用不同的宽度值（0.5 for geom_col，0.9 for geom_errorbar），但相同的闪避宽度（0.6）：

ggplot(data = df, aes(x=group1, y = mean, fill = group2))+
  geom_col(position = position_dodge(0.6), width = 0.5) + 
  geom_errorbar(aes(ymin = mean - sd, ymax = mean + sd), width = 0.9,
                position = position_dodge(0.6)) +
  ggtitle("another example")

同时使用宽度和闪避时 geom_bar 的位置问题

Position problem with geom_bar when using both width and dodge

r

ggplot2

errorbar

潜在的直觉

按情节细分绘制