ggplot：多面条形图中的 weighted.mean 和 stat_summary

Question

我花了太多时间试图找出将 weighted.mean（或 wtd.mean）包含到 stat_summary 中并使其正常工作的解决方案。我查看了几页试图解决同一问题，但 none 找到了明确的解决方案。主要问题是 weighted.mean，一旦放在 stat_summary 中，就找不到它的权重分量，这显然不能从 ggplot and/or stat_summary 美学中传递下来（相信我，我试过了；见例子）。现在，我尝试了各种方法，甚至使用基于 ddplyr 的函数（如另一页中所建议的）生成了加权均值的条形图，但是，除了有点笨拙之外，它还不允许分面，因为它更改了源数据帧。

以下是专门针对此问题构建的数据框。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(注意：我还添加了 "shape" 列，即使我不会使用它，只是为了提醒我，我不想在此过程中丢失任何数据，但它需要最后可用。）

最初的设置是一个带有 "mean" 的 ggplot，其中包括分面，如：

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

以下是相应的代码，但是 "weighted.mean" --> "w" 美学被忽略，因此它假定所有权重都相等（根据 weighted.mean 函数定义), 这导致一个简单的平均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

如你所见，它给出了一个警告 警告：忽略未知美学：w

我尝试了几种方法使它 "see" 成为权重变量，但没有成功。最后我意识到最有前途的方法是重新定义 weight.mean 函数，使其默认的 "w" 成为 "x" 的函数。 Weighted.mean 仍然看不到任何 "w" 美学，但它会计算一个作为默认值。为此，我尝试将本机函数 (weighted.mean) 嵌套到通用函数中，这样我就可以更改参数。

循序渐进。

首先，我尝试使用 "mean"（并且有效）。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它仍然没有读取 "w"（因为没有指定 "w"）所以它返回一个正常的平均值。

然后我尝试将 "w" 参数指定为数据帧中的权重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它不起作用。警告消息说： stat_summary() 计算失败： 'x' 和 'w' 必须具有相同的长度

被卡住了，我尝试生成一个随机数字序列，但长度与 "x" 相同，结果出乎意料地有效。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

显然，有一种方法可以欺骗它，但如果我只能使用随机权重，那是没有用的。

我尝试在函数中打印 "x" 然后应用它，虽然它产生了一些东西，但即使 "mean" 也不能正常工作了。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

所以，我无法弄清楚的棘手部分是如何将 "w" 默认值与 "x" 正确关联，以便当 weighted.mean 是在 stat_summary 内调用，而不是读取 "w"，无论如何使用正确的权重。

正如我提到的，还有一个 ddply 解决方法 来获得加权平均图 - 因为它基于创建一个新的源数据框，其中只有已经组织的变量和加权意味着，但它不允许分面！！！

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

谢谢！

Answer 1

ggplot 的内置汇总函数并不总是有用，很多时候你最好在一个单独的步骤中计算你的汇总，然后绘制它。

您的基本示例图实际上是不正确的。它显示 "aether" 的 delta 和 epsilon 的均值分别为 5 和 7，这在原始数据中显然不是这种情况（这两个值都是 1）。但是那些是数据框中第一个元素的值 ("water")。出现错误是因为 ggplot 按字母顺序构建其构面，同时，您传入原始向量（platonic$value，而不是简单的 value），这会导致在错误的位置。在使用 ggplot 时，您应该始终传递原始的、不带引号的列名，以便 ggplot 可以弄清楚如何处理关联的数据。

基本情节的正确版本是：

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

至于使用 weighted.mean，正如我上面所说，这里唯一合理的做法是单独计算，并绘制结果：

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

由于生成的数据框仍然包含第一个图中使用的所有列名，您只需换入新数据集即可：

g.weighted <- g %+% platonic.weighted

在这个例子中，两个图是相同的，但你的里程可能会有所不同。

关于您的预期最终结果是什么，您的问题有点不清楚，但根据给出的示例，我假设您想要每个希腊字母的加权平均值。我们可以使用 summarize 轻松地做到这一点，或者如果你真的想要，你可以使用 mutate 代替插入一列权重而不丢失原始数据：

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))

ggplot：多面条形图中的 weighted.mean 和 stat_summary

ggplot: weighted.mean and stat_summary in a facetted bar plot

r

facet

weighted-average

bar-chart

ggplot2