使用 ggplot 2 使用线条或线段将堆栈条形图与多个组连接起来

Connect stack bar charts with multiple groups with lines or segments using ggplot 2

我正在对一些患有疾病的患者进行研究,并在 3 个不同的时间点使用功能状态的顺序量表评估。我想在这些时间点连接堆叠条形图中的多个组。

我查看了这些主题,但没有使用这些建议来解决问题:

How to position lines at the edges of stacked bar charts

Is there an efficient way to draw lines between different elements in a stacked bar plot using ggplot2?

请查看我最终希望该图如何从 R(在 PRISM 中生成)中查看三个时间点上这 6 个序数值中每一个的频率的图形表示(最高组没有序数得分的患者) 3,5,6):

Intended FIGURE using PRISM

数据:

library(tidyverse)

mrs <-tibble(
  Score = c(0,1,2,3,4,5,6),
  pMRS = c(17,  2,   1,  0,  1,  0,   0),
  dMRS = c(2,  3,   2,  6,  4,  2,  2),
  fMRS = c(4,  4,  5,  4,  1,  1,  2)

这是我在 运行 之前尝试过的代码)

mrs <- mrs %>% mutate(across(-Score,~paste(round(prop.table(.) * 100, 2)))) %>%
   pivot_longer(cols = c("pMRS", "dMRS", "fMRS"), names_to = "timepoint") %>% 
   mutate(Score=as.character(Score),
          value=as.numeric(value)) %>% 
   mutate(timepoint = factor(timepoint, 
                             levels= c("fMRS", 
                              "dMRS",
                              "pMRS"))) %>% 
   mutate(Score = factor(Score,
                         levels = c("6","5","4","3","2","1","0")))
mrs %>% ggplot(aes(y= timepoint, x= value, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"), values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()

我认为没有简单的方法可以做到这一点,您必须自己(半)手动添加这些行。我在下面提出的建议来自 ,但适用于您的情况。本质上,它利用了 geom_area() 也像条形图一样可堆叠的事实。缺点是您必须手动输入条形图开始和结束位置的坐标,并且必须为每对堆叠条形图执行此操作。

library(tidyverse)

# mrs <- tibble(...) %>% mutate(...) # omitted for brevity, same as question

mrs %>% ggplot(aes(x= value, y= timepoint, fill= Score))+
  geom_bar(color= "black", width = 0.6, stat= "identity") +
  geom_area(
    # Last two stacked bars
    data = ~ subset(.x, timepoint %in% c("pMRS", "dMRS")),
    # These exact values depend on the 'width' of the bars
    aes(y = c("pMRS" = 2.7, "dMRS" = 2.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", 
    # Alpha set to 0 to hide the fill colour
    alpha = 0, colour = "black",
    orientation = "y"
  ) +
  geom_area(
    # First two stacked bars
    data = ~ subset(.x, timepoint %in% c("dMRS", "fMRS")),
    aes(y = c("dMRS" = 1.7, "fMRS" = 1.3)[as.character(timepoint)]),
    position = "stack", outline.type = "both", alpha = 0, colour = "black",
    orientation = "y"
  ) +
  scale_fill_manual(name= NULL,
                    breaks = c("6","5","4","3","2","1","0"),
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_discrete(breaks=c("pMRS",
                            "dMRS",
                            "fMRS"),
                   labels=c("Pre-mRS,  (N=21)",
                            "Discharge mRS,  (N=21)",
                            "Followup mRS,  (N=21)"))+
  theme_classic()

可以说,为线条制作一个单独的 data.frame 更直接,但也有点混乱。

您实际上是在创建冲积层图。您可以使用 ggalluvial 包。在所需的外观下方 - 我将其保持水平方式,因为从左到右阅读时间点更自然(至少在西方社会)。但是如果你真的想要,你可以简单地添加 coord_flip

此外 - 请参阅下面我个人认为更具吸引力的可视化效果的建议。

查看以下来源以获取有关冲积层图表的更多信息

library(tidyverse)
library(ggalluvial)

# I personally prefer to create a new object when you do data modifications
mrs_long <- 
  mrs %>% mutate(across(-Score,~paste(round(prop.table(.) * 100, 2)))) %>%
  pivot_longer(cols = c("pMRS", "dMRS", "fMRS"), names_to = "timepoint") %>% 
  mutate(Score=as.character(Score),
         value=as.numeric(value),
         ## I've reversed the level order
         timepoint = factor(timepoint, levels= rev(c("fMRS", "dMRS", "pMRS"))),
         Score = factor(Score, levels = 6:0))

ggplot(mrs_long,
       aes(y = value, x = timepoint)) +
  geom_flow(aes(alluvium = Score), alpha= .9, 
            lty = 2, fill = "white", color = "black",
            curve_type = "linear", 
            width = .5) +
  geom_col(aes(fill = Score), width = .5, color = "black") +
  scale_fill_manual(NULL, breaks = 6:0,
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_continuous(expand = c(0,0)) +
  cowplot::theme_minimal_hgrid()
#> Warning: The `.dots` argument of `group_by()` is deprecated as of dplyr 1.0.0.
#> This warning is displayed once every 8 hours.
#> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated.

可以说更有说服力 - 我发现通过充分利用“冲积层外观”可以更好地传达信息。例如,这可能看起来像这样:

ggplot(mrs_long,
       aes(y = value, x = timepoint, fill = Score)) +
  geom_alluvium(aes(alluvium = Score), alpha= .9, color = "black") +
  scale_fill_manual(NULL, breaks = 6:0,
                    values=  c("#000000","#294e63", "#496a80","#7c98ac", "#b3c4d2","#d9e0e6","#ffffff"))+
  scale_y_continuous(expand = c(0,0)) +
  cowplot::theme_minimal_hgrid()