如何在 ggplot2 中应用 after_stat 和分面？

Question

我正在使用移动平均线来消除疫苗分配中的星期几效应，以查看按各种因素分层的总体趋势。我可以创建一个滚动平均值的条形图来正确显示整体数据。但是当我分层或创建小平面时，下降高度的“幽灵”条出现在导入期（应该没有条）。我怎样才能避免这种情况？

正确的图表（无分层）：g

移动平均线导入期带有“幽灵”柱的图表：g + facet_grid(race~., scales="free_y")

我的代码

library(tidyverse)
# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace=T) * (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay)
                    , whichDose = factor(c(rep(1,sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace=T)))
                    , gender=sample(c("F", "M"), sum(nPerDay), replace=T)
                    , race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T)
                    , brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)

# plot the doses administered each day, with stacked bars', with bars' color indicating # of 1st or second dose
(ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose))
  + geom_bar(position = "stack")
  + geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
)

# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- (
  ggplot(data=doses, mapping=aes(x=Admin_date))#, fill=whichDose)) 
  + geom_bar(position = "stack"
             , mapping = aes(y=zoo::rollmean(..count.., 7, align="right", fill=NA))
             , stat="bin", binwidth=1
  )
  + geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1)
  + labs(y="doses", fill="Which dose,\n7d avg count")
)
g # display this base graph

# explore tha data
g + facet_grid(race~., scales="free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.

我知道我可以通过创建一个中间数据框来避免这种情况，该数据框预先计算出我想要的分层的移动平均值。但是必须有一种方法可以根据 https://yjunechoe.github.io/posts/2020-09-26-demystifying-stat-layers-ggplot2/ and maybe the after_stat() 函数在 R 中即时执行此操作。但我想不通。我希望有一个简单的解决方案，我可以与我的工作团队分享，这样其他人（R 经验较少的人）可以向基础图表添加分面函数来探索我们拥有的许多因素——我们拥有的不仅仅是性别、种族、品牌和哪个剂量。如果我能摆脱幽灵酒吧，他们可以添加这样的代码来获得其他分层：

# look at other stratifications
g + facet_grid(gender, scales="free_y")
g + facet_grid(race~brand, scales="free_y")
g + facet_grid(race~gender, scales="free_y")

Answer 1

问题是在计算统计数据之后，在统计数据之后发生的任何计算都不一定将面板考虑在内。这给 zoo::rollmean 带来了问题，因为它只看到一个值向量。因此，您必须按面板循环数据。

library(tidyverse)

nPerDay <- floor(sample(5:10, 70, replace=T) * 
                   (1 + ((1:70)*3/70)) * (.5 + (.5*(1:70 %% 7)/6)))
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
  whichDose = factor(c(rep(1,sum(nPerDay[1:30])), 
                       sample(1:2, sum(nPerDay[31:70]), replace=T))),
  gender=sample(c("F", "M"), sum(nPerDay), replace=T),
  race=sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace=T),
  brand=sample(c("Pf", "Mo"), sum(nPerDay), replace=T)
)


ggplot(data=doses[order(doses$race, doses$Admin_date), ], 
       mapping=aes(x=Admin_date)) +
  geom_bar(position = "identity"
           , mapping = aes(y=after_stat(
             unlist(lapply(split(count, PANEL), zoo::rollmean, 
                           7, align = "right", fill = NA))
           ))
           , stat="bin", binwidth=1
  ) +
  geom_line(aes(y=..count.., fill=NULL), stat = "bin", binwidth=1) + 
  labs(y="doses", fill="Which dose,\n7d avg count") +
  facet_grid(race ~ ., scales = "free_y")
#> Warning: Removed 30 rows containing missing values (geom_bar).

^{由 reprex package (v1.0.0)}

于 2021-02-20 创建

Answer 2

类似于@teunbrand 的方法（他对问题的简明解释值得赞扬，我没有什么可补充的）但是利用 dplyr 和一个辅助函数你可以实现你想要的结果如下：

library(tidyverse)

set.seed(42)

# Make fake data: count of doses per day for 70 days, increasing over the 70 days, with a 50% variance per day-of-week
nPerDay <- floor(sample(5:10, 70, replace = T) * (1 + ((1:70) * 3 / 70)) * (.5 + (.5 * (1:70 %% 7) / 6)))
# Use that to create a data frame where one record is the administration of one dose, giving the dose, vaccine brand, 1st or 2nd dose, pt race, & pt gender
doses <- data.frame(
  Admin_date = rep(as.Date("2020-12-31") + 1:70, nPerDay),
  whichDose = factor(c(rep(1, sum(nPerDay[1:30])), sample(1:2, sum(nPerDay[31:70]), replace = T))),
  gender = sample(c("F", "M"), sum(nPerDay), replace = T),
  race = sample(LETTERS[1:5], sum(nPerDay), c(.45, .25, .15, .1, .05), replace = T),
  brand = sample(c("Pf", "Mo"), sum(nPerDay), replace = T)
)

my_rollmean <- function(count, group) {
  data.frame(group = group, count = count) %>% 
    group_by(group) %>% 
    mutate(roll = zoo::rollmean(count, 7, align = "right", fill = NA)) %>% 
    pull(roll)
}

# Change the bars in the prior plot into rolling 7-day averages, but keep the line as a daily total count.
g <- ggplot(data = doses, mapping = aes(x = Admin_date)) +
  geom_bar(
    position = "stack",
    mapping = aes(y = my_rollmean(..count.., ..PANEL..)),
    stat = "bin", binwidth = 1
  ) +
  geom_line(aes(y = ..count.., fill = NULL), stat = "bin", binwidth = 1) +
  labs(y = "doses", fill = "Which dose,\n7d avg count")

# explore tha data
g + facet_grid(race ~ ., scales = "free_y") # See if the increasing trend and 1st vs 2nd dose porportions or similar across races.
#> Warning: Removed 30 rows containing missing values (position_stack).

如何在 ggplot2 中应用 after_stat 和分面？

How to apply after_stat with faceting in ggplot2?

r

graph

moving-average

ggplot2

正确的图表（无分层）：g

移动平均线导入期带有“幽灵”柱的图表：g + facet_grid(race~., scales="free_y")

我的代码