如何在 ggplot2 中应用 after_stat 和分面?
How to apply after_stat with faceting in ggplot2?
我正在使用移动平均线来消除疫苗分配中的星期几效应,以查看按各种因素分层的总体趋势。我可以创建一个滚动平均值的条形图来正确显示整体数据。但是当我分层或创建小平面时,下降高度的“幽灵”条出现在导入期(应该没有条)。我怎样才能避免这种情况?
正确的图表(无分层):g
移动平均线导入期带有“幽灵”柱的图表:g + facet_grid(race~., scales="free_y")
我的代码
library(tidyverse)
# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace=T) * (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay)
, whichDose = factor(c(rep(1,sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace=T)))
, gender=sample(c("F", "M"), sum(nPerDay), replace=T)
, race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T)
, brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)
# plot the doses administered each day, with stacked bars', with bars' color indicating # of 1st or second dose
(ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose))
+ geom_bar(position = "stack")
+ geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
)
# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- (
ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose))
+ geom_bar(position = "stack"
, mapping = aes(y=zoo::rollmean(..count.., 7, align="right", fill=NA))
, stat="bin", binwidth=1
)
+ geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
+ labs(y="doses", fill="Which dose,\n7d avg count")
)
g # display this base graph
# explore tha data
g + facet_grid(race~., scales="free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
我知道我可以通过创建一个中间数据框来避免这种情况,该数据框预先计算出我想要的分层的移动平均值。但是必须有一种方法可以根据 https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/ and maybe the after_stat() 函数在 R 中即时执行此操作。但我想不通。我希望有一个简单的解决方案,我可以与我的工作团队分享,这样其他人(R 经验较少的人)可以向基础图表添加分面函数来探索我们拥有的许多因素——我们拥有的不仅仅是性别、种族、品牌和哪个剂量。如果我能摆脱幽灵酒吧,他们可以添加这样的代码来获得其他分层:
# look at other stratifications
g + facet_grid(gender, scales="free_y")
g + facet_grid(race~brand, scales="free_y")
g + facet_grid(race~gender, scales="free_y")
问题是在计算统计数据之后,在统计数据之后发生的任何计算都不一定将面板考虑在内。这给 zoo::rollmean
带来了问题,因为它只看到一个值向量。因此,您必须按面板循环数据。
library(tidyverse)
nPerDay <- floor(sample(5:10, 70, replace=T) *
(1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
doses <- data.frame(
Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
whichDose = factor(c(rep(1,sum(nPerDay[1:30])),
sample(1:2, sum(nPerDay[31:70]), replace=T))),
gender=sample(c("F", "M"), sum(nPerDay), replace=T),
race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T),
brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)
ggplot(data=doses[order(doses$race, doses$Admin_date), ],
mapping=aes(x=Admin_date)) +
geom_bar(position = "identity"
, mapping = aes(y=after_stat(
unlist(lapply(split(count, PANEL), zoo::rollmean,
7, align = "right", fill = NA))
))
, stat="bin", binwidth=1
) +
geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1) +
labs(y="doses", fill="Which dose,\n7d avg count") +
facet_grid(race ~ ., scales = "free_y")
#> Warning: Removed 30 rows containing missing values (geom_bar).
由 reprex package (v1.0.0)
于 2021-02-20 创建
类似于@teunbrand 的方法(他对问题的简明解释值得赞扬,我没有什么可补充的)但是利用 dplyr
和一个辅助函数你可以实现你想要的结果如下:
library(tidyverse)
set.seed(42)
# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace = T) * (1 + ((1:70) * 3 / 70)) * (.5 + (.5 * (1:70 %% 7) / 6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(
Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
whichDose = factor(c(rep(1, sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace = T))),
gender = sample(c("F", "M"), sum(nPerDay), replace = T),
race = sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace = T),
brand = sample(c("Pf", "Mo"), sum(nPerDay), replace = T)
)
my_rollmean <- function(count, group) {
data.frame(group = group, count = count) %>%
group_by(group) %>%
mutate(roll = zoo::rollmean(count, 7, align = "right", fill = NA)) %>%
pull(roll)
}
# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- ggplot(data = doses, mapping = aes(x = Admin_date)) +
geom_bar(
position = "stack",
mapping = aes(y = my_rollmean(..count.., ..PANEL..)),
stat = "bin", binwidth = 1
) +
geom_line(aes(y = ..count.., fill = NULL), stat = "bin", binwidth = 1) +
labs(y = "doses", fill = "Which dose,\n7d avg count")
# explore tha data
g + facet_grid(race ~ ., scales = "free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
#> Warning: Removed 30 rows containing missing values (position_stack).
我正在使用移动平均线来消除疫苗分配中的星期几效应,以查看按各种因素分层的总体趋势。我可以创建一个滚动平均值的条形图来正确显示整体数据。但是当我分层或创建小平面时,下降高度的“幽灵”条出现在导入期(应该没有条)。我怎样才能避免这种情况?
正确的图表(无分层):g
移动平均线导入期带有“幽灵”柱的图表:g + facet_grid(race~., scales="free_y")
我的代码
library(tidyverse)
# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace=T) * (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay)
, whichDose = factor(c(rep(1,sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace=T)))
, gender=sample(c("F", "M"), sum(nPerDay), replace=T)
, race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T)
, brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)
# plot the doses administered each day, with stacked bars', with bars' color indicating # of 1st or second dose
(ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose))
+ geom_bar(position = "stack")
+ geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
)
# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- (
ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose))
+ geom_bar(position = "stack"
, mapping = aes(y=zoo::rollmean(..count.., 7, align="right", fill=NA))
, stat="bin", binwidth=1
)
+ geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
+ labs(y="doses", fill="Which dose,\n7d avg count")
)
g # display this base graph
# explore tha data
g + facet_grid(race~., scales="free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
我知道我可以通过创建一个中间数据框来避免这种情况,该数据框预先计算出我想要的分层的移动平均值。但是必须有一种方法可以根据 https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/ and maybe the after_stat() 函数在 R 中即时执行此操作。但我想不通。我希望有一个简单的解决方案,我可以与我的工作团队分享,这样其他人(R 经验较少的人)可以向基础图表添加分面函数来探索我们拥有的许多因素——我们拥有的不仅仅是性别、种族、品牌和哪个剂量。如果我能摆脱幽灵酒吧,他们可以添加这样的代码来获得其他分层:
# look at other stratifications
g + facet_grid(gender, scales="free_y")
g + facet_grid(race~brand, scales="free_y")
g + facet_grid(race~gender, scales="free_y")
问题是在计算统计数据之后,在统计数据之后发生的任何计算都不一定将面板考虑在内。这给 zoo::rollmean
带来了问题,因为它只看到一个值向量。因此,您必须按面板循环数据。
library(tidyverse)
nPerDay <- floor(sample(5:10, 70, replace=T) *
(1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
doses <- data.frame(
Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
whichDose = factor(c(rep(1,sum(nPerDay[1:30])),
sample(1:2, sum(nPerDay[31:70]), replace=T))),
gender=sample(c("F", "M"), sum(nPerDay), replace=T),
race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T),
brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)
ggplot(data=doses[order(doses$race, doses$Admin_date), ],
mapping=aes(x=Admin_date)) +
geom_bar(position = "identity"
, mapping = aes(y=after_stat(
unlist(lapply(split(count, PANEL), zoo::rollmean,
7, align = "right", fill = NA))
))
, stat="bin", binwidth=1
) +
geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1) +
labs(y="doses", fill="Which dose,\n7d avg count") +
facet_grid(race ~ ., scales = "free_y")
#> Warning: Removed 30 rows containing missing values (geom_bar).
由 reprex package (v1.0.0)
于 2021-02-20 创建类似于@teunbrand 的方法(他对问题的简明解释值得赞扬,我没有什么可补充的)但是利用 dplyr
和一个辅助函数你可以实现你想要的结果如下:
library(tidyverse)
set.seed(42)
# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace = T) * (1 + ((1:70) * 3 / 70)) * (.5 + (.5 * (1:70 %% 7) / 6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(
Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
whichDose = factor(c(rep(1, sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace = T))),
gender = sample(c("F", "M"), sum(nPerDay), replace = T),
race = sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace = T),
brand = sample(c("Pf", "Mo"), sum(nPerDay), replace = T)
)
my_rollmean <- function(count, group) {
data.frame(group = group, count = count) %>%
group_by(group) %>%
mutate(roll = zoo::rollmean(count, 7, align = "right", fill = NA)) %>%
pull(roll)
}
# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- ggplot(data = doses, mapping = aes(x = Admin_date)) +
geom_bar(
position = "stack",
mapping = aes(y = my_rollmean(..count.., ..PANEL..)),
stat = "bin", binwidth = 1
) +
geom_line(aes(y = ..count.., fill = NULL), stat = "bin", binwidth = 1) +
labs(y = "doses", fill = "Which dose,\n7d avg count")
# explore tha data
g + facet_grid(race ~ ., scales = "free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
#> Warning: Removed 30 rows containing missing values (position_stack).