ggplot:多面条形图中的 weighted.mean 和 stat_summary
ggplot: weighted.mean and stat_summary in a facetted bar plot
我花了太多时间试图找出将 weighted.mean(或 wtd.mean)包含到 stat_summary 中并使其正常工作的解决方案。
我查看了几页试图解决同一问题,但 none 找到了明确的解决方案。
主要问题是 weighted.mean,一旦放在 stat_summary 中,就找不到它的权重分量,这显然不能从 ggplot and/or stat_summary 美学中传递下来(相信我,我试过了;见例子)。
现在,我尝试了各种方法,甚至使用基于 ddplyr 的函数(如另一页中所建议的)生成了加权均值的条形图,但是,除了有点笨拙之外,它还不允许分面,因为它更改了源数据帧。
以下是专门针对此问题构建的数据框。
elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)
(注意:我还添加了 "shape" 列,即使我不会使用它,只是为了提醒我,我不想在此过程中丢失任何数据,但它需要最后可用。)
最初的设置是一个带有 "mean" 的 ggplot,其中包括分面,如:
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
facet_wrap(~elements~existence)
以下是相应的代码,但是 "weighted.mean" --> "w" 美学被忽略,因此它假定所有权重都相等(根据 weighted.mean 函数定义), 这导致一个简单的平均值
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
如你所见,它给出了一个警告
警告:忽略未知美学:w
我尝试了几种方法使它 "see" 成为权重变量,但没有成功。最后我意识到最有前途的方法是重新定义 weight.mean 函数,使其默认的 "w" 成为 "x" 的函数。 Weighted.mean 仍然看不到任何 "w" 美学,但它会计算一个作为默认值。为此,我尝试将本机函数 (weighted.mean) 嵌套到通用函数中,这样我就可以更改参数。
循序渐进。
首先,我尝试使用 "mean"(并且有效)。
mean.modif <- function(x) {
mean(x)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
然后用weighted.mean
weighted.mean.modif <- function(x,w) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
但它仍然没有读取 "w"(因为没有指定 "w")所以它返回一个正常的平均值。
然后我尝试将 "w" 参数指定为数据帧中的权重列
weighted.mean.modif1 <- function(x,w=platonic$importance) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
但它不起作用。警告消息说:
stat_summary()
计算失败:
'x' 和 'w' 必须具有相同的长度
被卡住了,我尝试生成一个随机数字序列,但长度与 "x" 相同,结果出乎意料地有效。
weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
显然,有一种方法可以欺骗它,但如果我只能使用随机权重,那是没有用的。
我尝试在函数中打印 "x" 然后应用它,虽然它产生了一些东西,但即使 "mean" 也不能正常工作了。
mean.modif3 <- function(x) {
mean(x)
print(x)
}
所以,我无法弄清楚的棘手部分是如何将 "w" 默认值与 "x" 正确关联,以便当 weighted.mean 是在 stat_summary 内调用,而不是读取 "w",无论如何使用正确的权重。
正如我提到的,还有一个 ddply 解决方法 来获得加权平均图 - 因为它基于创建一个新的源数据框,其中只有已经组织的变量和加权意味着,但它不允许分面!!!
weighted.fictious <- function(xxxx, yyyy) {
ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}
ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
geom_bar(stat = "identity")
谢谢!
ggplot 的内置汇总函数并不总是有用,很多时候你最好在一个单独的步骤中计算你的汇总,然后绘制它。
您的基本示例图实际上是不正确的。它显示 "aether" 的 delta 和 epsilon 的均值分别为 5 和 7,这在原始数据中显然不是这种情况(这两个值都是 1)。但是那些 是 数据框中第一个元素的值 ("water")。出现错误是因为 ggplot 按字母顺序构建其构面,同时,您传入原始向量(platonic$value
,而不是简单的 value
),这会导致在错误的位置。在使用 ggplot 时,您应该始终传递原始的、不带引号的列名,以便 ggplot 可以弄清楚如何处理关联的数据。
基本情节的正确版本是:
g <- ggplot(data = platonic)+
stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
facet_wrap(~elements~existence)
print(g)
至于使用 weighted.mean
,正如我上面所说,这里唯一合理的做法是单独计算,并绘制结果:
platonic.weighted <- platonic %>%
group_by(elements, existence, greek_letter) %>%
summarize(value = weighted.mean(value, weights = importance))
由于生成的数据框仍然包含第一个图中使用的所有列名,您只需换入新数据集即可:
g.weighted <- g %+% platonic.weighted
在这个例子中,两个图是相同的,但你的里程可能会有所不同。
关于您的预期最终结果是什么,您的问题有点不清楚,但根据给出的示例,我假设您想要每个希腊字母的加权平均值。我们可以使用 summarize
轻松地做到这一点,或者如果你真的想要,你可以使用 mutate
代替插入一列权重而不丢失原始数据:
platonic.weighted <- platonic %>%
group_by(greek_letter) %>%
mutate(weighted.letter = weighted.mean(value, weights = importance))
我花了太多时间试图找出将 weighted.mean(或 wtd.mean)包含到 stat_summary 中并使其正常工作的解决方案。 我查看了几页试图解决同一问题,但 none 找到了明确的解决方案。 主要问题是 weighted.mean,一旦放在 stat_summary 中,就找不到它的权重分量,这显然不能从 ggplot and/or stat_summary 美学中传递下来(相信我,我试过了;见例子)。 现在,我尝试了各种方法,甚至使用基于 ddplyr 的函数(如另一页中所建议的)生成了加权均值的条形图,但是,除了有点笨拙之外,它还不允许分面,因为它更改了源数据帧。
以下是专门针对此问题构建的数据框。
elements <- c("water","water","water","water","water","water","air","air","air","air","air","air","earth","earth","earth","earth","earth","earth","fire","fire","fire","fire","fire","fire","aether","aether","aether","aether","aether","aether")
shapes <- c("icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","icosahedron","octahedron","octahedron","octahedron","octahedron","octahedron","octahedron","cube","cube","cube","cube","cube","cube","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","tetrahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron","dodecahedron")
greek_letter <- c("alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta","alpha","beta","gamma","delta","epsilon","zeta")
existence <- c("real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","real","not real","not real","not real","not real","not real","not real")
value <- c(0,0,0,5,7,0,0,1,0,20,3,0,0,2,2,1,8,0,0,8,10,4,2,0,0,0,0,1,1,0)
importance <- c(20,20,20,20,20,20,10,10,10,10,10,10,3,3,3,3,3,3,9,9,9,9,9,9,50,50,50,50,50,50)
platonic <- data.frame(elements,shapes,greek_letter,existence,value,importance)
(注意:我还添加了 "shape" 列,即使我不会使用它,只是为了提醒我,我不想在此过程中丢失任何数据,但它需要最后可用。)
最初的设置是一个带有 "mean" 的 ggplot,其中包括分面,如:
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
facet_wrap(~elements~existence)
以下是相应的代码,但是 "weighted.mean" --> "w" 美学被忽略,因此它假定所有权重都相等(根据 weighted.mean 函数定义), 这导致一个简单的平均值
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value, w=platonic$importance), fun.y = "weighted.mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
如你所见,它给出了一个警告 警告:忽略未知美学:w
我尝试了几种方法使它 "see" 成为权重变量,但没有成功。最后我意识到最有前途的方法是重新定义 weight.mean 函数,使其默认的 "w" 成为 "x" 的函数。 Weighted.mean 仍然看不到任何 "w" 美学,但它会计算一个作为默认值。为此,我尝试将本机函数 (weighted.mean) 嵌套到通用函数中,这样我就可以更改参数。
循序渐进。
首先,我尝试使用 "mean"(并且有效)。
mean.modif <- function(x) {
mean(x)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
然后用weighted.mean
weighted.mean.modif <- function(x,w) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
但它仍然没有读取 "w"(因为没有指定 "w")所以它返回一个正常的平均值。
然后我尝试将 "w" 参数指定为数据帧中的权重列
weighted.mean.modif1 <- function(x,w=platonic$importance) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "mean.modif", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
但它不起作用。警告消息说:
stat_summary()
计算失败:
'x' 和 'w' 必须具有相同的长度
被卡住了,我尝试生成一个随机数字序列,但长度与 "x" 相同,结果出乎意料地有效。
weighted.mean.modif2 <- function(x,w=runif(x, min = 0, max = 100)) {
weighted.mean(x,w)
}
ggplot(data = platonic)+
stat_summary(mapping = aes(x=platonic$greek_letter, y=platonic$value), fun.y = "weighted.mean.modif2", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)
显然,有一种方法可以欺骗它,但如果我只能使用随机权重,那是没有用的。
我尝试在函数中打印 "x" 然后应用它,虽然它产生了一些东西,但即使 "mean" 也不能正常工作了。
mean.modif3 <- function(x) {
mean(x)
print(x)
}
所以,我无法弄清楚的棘手部分是如何将 "w" 默认值与 "x" 正确关联,以便当 weighted.mean 是在 stat_summary 内调用,而不是读取 "w",无论如何使用正确的权重。
正如我提到的,还有一个 ddply 解决方法 来获得加权平均图 - 因为它基于创建一个新的源数据框,其中只有已经组织的变量和加权意味着,但它不允许分面!!!
weighted.fictious <- function(xxxx, yyyy) {
ddply(xxxx, .(yyyy), function(m) data.frame(fictious_weightedmean=weighted.mean(m$value, m$importance, na.rm = FALSE)))
}
ggplot(data = weighted.fictious(xxxx = platonic, yyyy = platonic$greek_letter), aes(x=yyyy, y=fictious_weightedmean))+
geom_bar(stat = "identity")
谢谢!
ggplot 的内置汇总函数并不总是有用,很多时候你最好在一个单独的步骤中计算你的汇总,然后绘制它。
您的基本示例图实际上是不正确的。它显示 "aether" 的 delta 和 epsilon 的均值分别为 5 和 7,这在原始数据中显然不是这种情况(这两个值都是 1)。但是那些 是 数据框中第一个元素的值 ("water")。出现错误是因为 ggplot 按字母顺序构建其构面,同时,您传入原始向量(platonic$value
,而不是简单的 value
),这会导致在错误的位置。在使用 ggplot 时,您应该始终传递原始的、不带引号的列名,以便 ggplot 可以弄清楚如何处理关联的数据。
基本情节的正确版本是:
g <- ggplot(data = platonic)+
stat_summary(mapping = aes(x=greek_letter, y=value), fun.y = "mean", geom = "bar", na.rm = TRUE, inherit.aes = FALSE)+
facet_wrap(~elements~existence)
print(g)
至于使用 weighted.mean
,正如我上面所说,这里唯一合理的做法是单独计算,并绘制结果:
platonic.weighted <- platonic %>%
group_by(elements, existence, greek_letter) %>%
summarize(value = weighted.mean(value, weights = importance))
由于生成的数据框仍然包含第一个图中使用的所有列名,您只需换入新数据集即可:
g.weighted <- g %+% platonic.weighted
在这个例子中,两个图是相同的,但你的里程可能会有所不同。
关于您的预期最终结果是什么,您的问题有点不清楚,但根据给出的示例,我假设您想要每个希腊字母的加权平均值。我们可以使用 summarize
轻松地做到这一点,或者如果你真的想要,你可以使用 mutate
代替插入一列权重而不丢失原始数据:
platonic.weighted <- platonic %>%
group_by(greek_letter) %>%
mutate(weighted.letter = weighted.mean(value, weights = importance))