ggplot:多面条形图中的 weighted.mean 和 stat_summary

ggplot: weighted.mean and stat_summary in a facetted bar plot

我花了太多时间试图找出将 weighted.mean(或 wtd.mean)包含到 stat_summary 中并使其正常工作的解决方案。 我查看了几页试图解决同一问题,但 none 找到了明确的解决方案。 主要问题是 weighted.mean,一旦放在 stat_summary 中,就找不到它的权重分量,这显然不能从 ggplot and/or stat_summary 美学中传递下来(相信我,我试过了;见例子)。 现在,我尝试了各种方法,甚至使用基于 ddplyr 的函数(如另一页中所建议的)生成了加权均值的条形图,但是,除了有点笨拙之外,它还不允许分面,因为它更改了源数据帧。

以下是专门针对此问题构建的数据框。

elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)

(注意:我还添加了 "shape" 列,即使我不会使用它,只是为了提醒我,我不想在此过程中丢失任何数据,但它需要最后可用。)

最初的设置是一个带有 "mean" 的 ggplot,其中包括分面,如:

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)

以下是相应的代码,但是 "weighted.mean" --> "w" 美学被忽略,因此它假定所有权重都相等(根据 weighted.mean 函数定义), 这导致一个简单的平均值

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

如你所见,它给出了一个警告 警告:忽略未知美学:w

我尝试了几种方法使它 "see" 成为权重变量,但没有成功。最后我意识到最有前途的方法是重新定义 weight.mean 函数,使其默认的 "w" 成为 "x" 的函数。 Weighted.mean 仍然看不到任何 "w" 美学,但它会计算一个作为默认值。为此,我尝试将本机函数 (weighted.mean) 嵌套到通用函数中,这样我就可以更改参数。

循序渐进。

首先,我尝试使用 "mean"(并且有效)。

mean.modif <- function(x) {
  mean(x)
}

ggplot(data = platonic)+
      stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

然后用weighted.mean

   weighted.mean.modif <- function(x,w) {
      weighted.mean(x,w)
    }

 ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它仍然没有读取 "w"(因为没有指定 "w")所以它返回一个正常的平均值。

然后我尝试将 "w" 参数指定为数据帧中的权重列

weighted.mean.modif1 <- function(x,w=platonic$importance) {
  weighted.mean(x,w)
}

ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

但它不起作用。警告消息说: stat_summary() 计算失败: 'x' 和 'w' 必须具有相同的长度

被卡住了,我尝试生成一个随机数字序列,但长度与 "x" 相同,结果出乎意料地有效。

weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
  weighted.mean(x,w)
}
ggplot(data = platonic)+
  stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)

显然,有一种方法可以欺骗它,但如果我只能使用随机权重,那是没有用的。

我尝试在函数中打印 "x" 然后应用它,虽然它产生了一些东西,但即使 "mean" 也不能正常工作了。

mean.modif3 <- function(x) {
  mean(x)
  print(x)
}

所以,我无法弄清楚的棘手部分是如何将 "w" 默认值与 "x" 正确关联,以便当 weighted.mean 是在 stat_summary 内调用,而不是读取 "w",无论如何使用正确的权重。

正如我提到的,还有一个 ddply 解决方法 来获得加权平均图 - 因为它基于创建一个新的源数据框,其中只有已经组织的变量和加权意味着,但它不允许分面!!!

weighted.fictious <- function(xxxx, yyyy) {
  ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}

ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
  geom_bar(stat = "identity")

谢谢!

ggplot 的内置汇总函数并不总是有用,很多时候你最好在一个单独的步骤中计算你的汇总,然后绘制它。

您的基本示例图实际上是不正确的。它显示 "aether" 的 delta 和 epsilon 的均值分别为 5 和 7,这在原始数据中显然不是这种情况(这两个值都是 1)。但是那些 数据框中第一个元素的值 ("water")。出现错误是因为 ggplot 按字母顺序构建其构面,同时,您传入原始向量(platonic$value,而不是简单的 value),这会导致在错误的位置。在使用 ggplot 时,您应该始终传递原始的、不带引号的列名,以便 ggplot 可以弄清楚如何处理关联的数据。

基本情节的正确版本是:

g <- ggplot(data = platonic)+
  stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
  facet_wrap(~elements~existence)
print(g)

至于使用 weighted.mean,正如我上面所说,这里唯一合理的做法是单独计算,并绘制结果:

platonic.weighted <- platonic %>% 
  group_by(elements, existence, greek_letter) %>% 
  summarize(value = weighted.mean(value, weights = importance))

由于生成的数据框仍然包含第一个图中使用的所有列名,您只需换入新数据集即可:

g.weighted <- g %+% platonic.weighted

在这个例子中,两个图是相同的,但你的里程可能会有所不同。

关于您的预期最终结果是什么,您的问题有点不清楚,但根据给出的示例,我假设您想要每个希腊字母的加权平均值。我们可以使用 summarize 轻松地做到这一点,或者如果你真的想要,你可以使用 mutate 代替插入一列权重而不丢失原始数据:

platonic.weighted <- platonic %>% 
  group_by(greek_letter) %>% 
  mutate(weighted.letter = weighted.mean(value, weights = importance))