准备用于 ggalluvial 的数据集和创建冲积图的问题

Issue with preparing dataset for use with ggalluvial and creating alluvial diagram

我刚开始使用 ggalluvial 包。我目前正在处理一个捐赠数据集,我想使用冲积图作为媒介来表示它。以下是我正在使用的数据集示例:

   donor_ID recip_name donation_amt month_year    
   <chr>    <chr>             <dbl> <chr>         
 1 1        B, P                 25 September 2019
 2 2        S, B                 27 July 2019     
 3 3        K, A                 50 June 2019     
 4 1        H, K                100 April 2019    
 5 2        W, E                  3 December 2019 
 6 3        S, B                  9 August 2019   
 7 1       C, J                 25 September 2019
 8 2       B, J                 50 October 2019  
 9 3       W, E                400 August 2019   
10 1       S, B                 20 December 2019 

dput() 在此数据集上的输出如下:

structure(list(donor_ID = c("1", "2", "3", "1", "2", "3", "1", 
"2", "3", "1"), recip_name = c("B, P", "S, B", "K, A", "H, K", 
"W, E", "S, B", "C, J", "B, J", "W, E", "S, B"), donation_amt = c(25, 
27, 50, 100, 3, 9, 25, 50, 400, 20), month_year = c("September 2019", 
"July 2019", "June 2019", "April 2019", "December 2019", "August 2019", 
"September 2019", "October 2019", "August 2019", "December 2019"
)), class = "data.frame", row.names = c(NA, -10L))

我希望代表个人捐助者对谁接受 (recip_name) 的选择,他们的捐款可能每个月都会发生变化(捐助者偏好),而 donor_ID 代表个人捐助者。由此产生的冲积图应显示每个月之间的变化,其方式也与接受者之间移动的总捐赠金额 (donation_amt) 成正比。下面是我为完成此任务而编写的脚本:

df$recip_name <- as.factor(df$recip_name)
df %>% 
  filter(transaction_dt < as.Date("2020-01-01")) %>% 
  select(donor_ID, recip_name, donation_amt, month_year) %>% 
  ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
             alluvium = donor_ID, fill = recip_name, label = recip_name)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", color = "darkgray") +
  geom_stratum() +
  theme_light() +
  theme(legend.position = "bottom") +
  ggtitle("Donor Preference")

执行此 R 代码后,这是我收到的结果错误:

Error in f(...) : 
  Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).

我已经对正确设置用于 ggalluvial 的数据的问题进行了研究,但无济于事。如何使用这些数据正确绘制所需的冲积图?

目前,绘图图层抛出的错误比冲积结构测试本身抛出的错误信息量少。测试还使用不同的术语:id 代表 alluviumkey 代表 xvalue 代表 stratum。 (对此我深表歉意!这些将在未来的版本中更改。)您的数据试图采用 lodes(长)形式,并且 is_lodes_form() 测试(下方)表示存在重复的 id-axis 对。

我之前没有注意到,但确实存在至少一对重复:有两行donor_ID = 1month_year = September 2019。冲积层要求每个冲积层 (id) 最多通过每个轴一次。在移除这个和另一个之后,冲积地块确实呈现(下图)。大概是因为这只是数据的样本,所以情节很稀疏。

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(stringr)
library(ggalluvial)
#> Loading required package: ggplot2

df <- structure(list(
  donor_ID = c("1", "2", "3", "1", "2", "3", "1", "2", "3", "1"),
  recip_name = c("B, P", "S, B", "K, A", "H, K", "W, E", "S, B", "C, J", "B, J", "W, E", "S, B"),
  donation_amt = c(25, 27, 50, 100, 3, 9, 25, 50, 400, 20),
  month_year = c("September 2019", "July 2019", "June 2019", "April 2019", "December 2019", "August 2019", "September 2019", "October 2019", "August 2019", "December 2019")
), class = "data.frame", row.names = c(NA, -10L))
df$recip_name <- as.factor(df$recip_name)

is_lodes_form(df, key = month_year, value = recip_name, id = donor_ID)
#> Duplicated id-axis pairings.
#> [1] FALSE

df %>%
  slice(-c(7, 9)) %>%
  mutate(month = match(str_remove(month_year, " 2019"), month.name)) %>%
  ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
             alluvium = donor_ID, fill = recip_name, label = recip_name)) +
  scale_fill_brewer(type = "qual", palette = "Set2") +
  geom_flow(stat = "alluvium", color = "darkgray") +
  geom_stratum() +
  theme_light() +
  theme(legend.position = "bottom") +
  ggtitle("Donor Preference")

reprex package (v2.0.1)

于 2022-01-30 创建

该图非常稀疏,可能是因为这只是您的数据样本。而且你必须做更多的事情来清理情节,例如将 character-valued month_year 转换为一个因素或日期。

如果您想区分同一捐赠者对不同接受者的捐赠,那么您可能想要使用的观察单位是 donor_IDrecip_name 的交互作用。将其传递给 alluvium 美学,将 recip_name 传递给 stratum,将 donor_ID 传递给 fill 可能会产生您想要的情节。