facet_wrap 的百分比直方图
Percentage histogram with facet_wrap
我正在尝试将百分比直方图与 facet_wrap
相结合,但百分比不是基于组计算的,而是基于所有数据计算的。我希望每个直方图显示一组中的分布,而不是相对于所有人口的分布。我知道可以做几个图并将它们与 multiplot
.
结合起来
library(ggplot2)
library(scales)
library(dplyr)
set.seed(1)
df <- data.frame(age = runif(900, min = 10, max = 100),
group = rep(c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 100))
tmp <- df %>%
mutate(group = "ALL")
df <- rbind(df, tmp)
ggplot(df, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
输出:
尝试使用 y = stat(density)
(或 ggplot2 版本 3.0.0 之前的 y = ..density..
)而不是 y = (..count..)/sum(..count..)
ggplot(df, aes(age, group = group)) +
geom_histogram(aes(y = stat(density) * 5), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
来自“计算变量”下的 ?geom_histogram
density : density of points in bin, scaled to integrate to 1
我们乘以 5(bin 宽度),因为 y 轴是密度(面积积分为 1),而不是百分比(高度总和为 1),参见 Hadley's comment(感谢@MariuszSiatka).
虽然似乎 facet_wrap
没有 运行 每个子集中的特殊 geom_histogram
百分比计算,但考虑单独构建一个地块列表,然后将它们网格排列在一起。
具体来说,调用 by
到 运行 你的 ggplots 在 group 的子集中,然后调用 gridExtra::grid.arrange()
(实际的包方法)有点模仿 facet_wrap
:
library(ggplot2)
library(scales)
library(gridExtra)
...
grp_plots <- by(df, df$group, function(sub){
ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]]) +
theme(plot.title = element_text(hjust = 0.5))
})
grid.arrange(grobs = grp_plots, ncol=5)
但是为了避免重复的 y 轴和 x 轴,考虑在 by
调用中有条件地设置 theme
,假设您提前了解您的组并且他们在数字。
grp_plots <- by(df, df$group, function(sub){
# BASE GRAPH
p <- ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]])
# CONDITIONAL theme() CALLS
if (sub$group[[1]] %in% c("a")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.x = element_blank(),
axis.text.x = element_blank(), axis.ticks.x = element_blank())
}
else if (sub$group[[1]] %in% c("f")) {
p <- p + theme(plot.title = element_text(hjust = 0.5))
}
else if (sub$group[[1]] %in% c("b", "c", "d", "e")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.ticks.x = element_blank())
}
else {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank())
}
return(p)
})
grid.arrange(grobs=grp_plots, ncol=5)
我正在尝试将百分比直方图与 facet_wrap
相结合,但百分比不是基于组计算的,而是基于所有数据计算的。我希望每个直方图显示一组中的分布,而不是相对于所有人口的分布。我知道可以做几个图并将它们与 multiplot
.
library(ggplot2)
library(scales)
library(dplyr)
set.seed(1)
df <- data.frame(age = runif(900, min = 10, max = 100),
group = rep(c("a", "b", "c", "d", "e", "f", "g", "h", "i"), 100))
tmp <- df %>%
mutate(group = "ALL")
df <- rbind(df, tmp)
ggplot(df, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
输出:
尝试使用 y = stat(density)
(或 ggplot2 版本 3.0.0 之前的 y = ..density..
)而不是 y = (..count..)/sum(..count..)
ggplot(df, aes(age, group = group)) +
geom_histogram(aes(y = stat(density) * 5), binwidth = 5) +
scale_y_continuous(labels = percent ) +
facet_wrap(~ group, ncol = 5)
来自“计算变量”下的 ?geom_histogram
density : density of points in bin, scaled to integrate to 1
我们乘以 5(bin 宽度),因为 y 轴是密度(面积积分为 1),而不是百分比(高度总和为 1),参见 Hadley's comment(感谢@MariuszSiatka).
虽然似乎 facet_wrap
没有 运行 每个子集中的特殊 geom_histogram
百分比计算,但考虑单独构建一个地块列表,然后将它们网格排列在一起。
具体来说,调用 by
到 运行 你的 ggplots 在 group 的子集中,然后调用 gridExtra::grid.arrange()
(实际的包方法)有点模仿 facet_wrap
:
library(ggplot2)
library(scales)
library(gridExtra)
...
grp_plots <- by(df, df$group, function(sub){
ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]]) +
theme(plot.title = element_text(hjust = 0.5))
})
grid.arrange(grobs = grp_plots, ncol=5)
但是为了避免重复的 y 轴和 x 轴,考虑在 by
调用中有条件地设置 theme
,假设您提前了解您的组并且他们在数字。
grp_plots <- by(df, df$group, function(sub){
# BASE GRAPH
p <- ggplot(sub, aes(age)) +
geom_histogram(aes(y = (..count..)/sum(..count..)), binwidth = 5) +
scale_y_continuous(labels = percent ) + ggtitle(sub$group[[1]])
# CONDITIONAL theme() CALLS
if (sub$group[[1]] %in% c("a")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.x = element_blank(),
axis.text.x = element_blank(), axis.ticks.x = element_blank())
}
else if (sub$group[[1]] %in% c("f")) {
p <- p + theme(plot.title = element_text(hjust = 0.5))
}
else if (sub$group[[1]] %in% c("b", "c", "d", "e")) {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank(),
axis.title.x = element_blank(), axis.text.x = element_blank(),
axis.ticks.x = element_blank())
}
else {
p <- p + theme(plot.title = element_text(hjust = 0.5), axis.title.y = element_blank(),
axis.text.y = element_blank(), axis.ticks.y = element_blank())
}
return(p)
})
grid.arrange(grobs=grp_plots, ncol=5)