准备用于 ggalluvial 的数据集和创建冲积图的问题
Issue with preparing dataset for use with ggalluvial and creating alluvial diagram
我刚开始使用 ggalluvial 包。我目前正在处理一个捐赠数据集,我想使用冲积图作为媒介来表示它。以下是我正在使用的数据集示例:
donor_ID recip_name donation_amt month_year
<chr> <chr> <dbl> <chr>
1 1 B, P 25 September 2019
2 2 S, B 27 July 2019
3 3 K, A 50 June 2019
4 1 H, K 100 April 2019
5 2 W, E 3 December 2019
6 3 S, B 9 August 2019
7 1 C, J 25 September 2019
8 2 B, J 50 October 2019
9 3 W, E 400 August 2019
10 1 S, B 20 December 2019
dput() 在此数据集上的输出如下:
structure(list(donor_ID = c("1", "2", "3", "1", "2", "3", "1",
"2", "3", "1"), recip_name = c("B, P", "S, B", "K, A", "H, K",
"W, E", "S, B", "C, J", "B, J", "W, E", "S, B"), donation_amt = c(25,
27, 50, 100, 3, 9, 25, 50, 400, 20), month_year = c("September 2019",
"July 2019", "June 2019", "April 2019", "December 2019", "August 2019",
"September 2019", "October 2019", "August 2019", "December 2019"
)), class = "data.frame", row.names = c(NA, -10L))
我希望代表个人捐助者对谁接受 (recip_name
) 的选择,他们的捐款可能每个月都会发生变化(捐助者偏好),而 donor_ID
代表个人捐助者。由此产生的冲积图应显示每个月之间的变化,其方式也与接受者之间移动的总捐赠金额 (donation_amt
) 成正比。下面是我为完成此任务而编写的脚本:
df$recip_name <- as.factor(df$recip_name)
df %>%
filter(transaction_dt < as.Date("2020-01-01")) %>%
select(donor_ID, recip_name, donation_amt, month_year) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
执行此 R 代码后,这是我收到的结果错误:
Error in f(...) :
Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
我已经对正确设置用于 ggalluvial 的数据的问题进行了研究,但无济于事。如何使用这些数据正确绘制所需的冲积图?
目前,绘图图层抛出的错误比冲积结构测试本身抛出的错误信息量少。测试还使用不同的术语:id
代表 alluvium
,key
代表 x
,value
代表 stratum
。 (对此我深表歉意!这些将在未来的版本中更改。)您的数据试图采用 lodes(长)形式,并且 is_lodes_form()
测试(下方)表示存在重复的 id-axis 对。
我之前没有注意到,但确实存在至少一对重复:有两行donor_ID = 1
和month_year = September 2019
。冲积层要求每个冲积层 (id) 最多通过每个轴一次。在移除这个和另一个之后,冲积地块确实呈现(下图)。大概是因为这只是数据的样本,所以情节很稀疏。
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
library(ggalluvial)
#> Loading required package: ggplot2
df <- structure(list(
donor_ID = c("1", "2", "3", "1", "2", "3", "1", "2", "3", "1"),
recip_name = c("B, P", "S, B", "K, A", "H, K", "W, E", "S, B", "C, J", "B, J", "W, E", "S, B"),
donation_amt = c(25, 27, 50, 100, 3, 9, 25, 50, 400, 20),
month_year = c("September 2019", "July 2019", "June 2019", "April 2019", "December 2019", "August 2019", "September 2019", "October 2019", "August 2019", "December 2019")
), class = "data.frame", row.names = c(NA, -10L))
df$recip_name <- as.factor(df$recip_name)
is_lodes_form(df, key = month_year, value = recip_name, id = donor_ID)
#> Duplicated id-axis pairings.
#> [1] FALSE
df %>%
slice(-c(7, 9)) %>%
mutate(month = match(str_remove(month_year, " 2019"), month.name)) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
由 reprex package (v2.0.1)
于 2022-01-30 创建
该图非常稀疏,可能是因为这只是您的数据样本。而且你必须做更多的事情来清理情节,例如将 character-valued month_year
转换为一个因素或日期。
如果您想区分同一捐赠者对不同接受者的捐赠,那么您可能想要使用的观察单位是 donor_ID
和 recip_name
的交互作用。将其传递给 alluvium
美学,将 recip_name
传递给 stratum
,将 donor_ID
传递给 fill
可能会产生您想要的情节。
我刚开始使用 ggalluvial 包。我目前正在处理一个捐赠数据集,我想使用冲积图作为媒介来表示它。以下是我正在使用的数据集示例:
donor_ID recip_name donation_amt month_year
<chr> <chr> <dbl> <chr>
1 1 B, P 25 September 2019
2 2 S, B 27 July 2019
3 3 K, A 50 June 2019
4 1 H, K 100 April 2019
5 2 W, E 3 December 2019
6 3 S, B 9 August 2019
7 1 C, J 25 September 2019
8 2 B, J 50 October 2019
9 3 W, E 400 August 2019
10 1 S, B 20 December 2019
dput() 在此数据集上的输出如下:
structure(list(donor_ID = c("1", "2", "3", "1", "2", "3", "1",
"2", "3", "1"), recip_name = c("B, P", "S, B", "K, A", "H, K",
"W, E", "S, B", "C, J", "B, J", "W, E", "S, B"), donation_amt = c(25,
27, 50, 100, 3, 9, 25, 50, 400, 20), month_year = c("September 2019",
"July 2019", "June 2019", "April 2019", "December 2019", "August 2019",
"September 2019", "October 2019", "August 2019", "December 2019"
)), class = "data.frame", row.names = c(NA, -10L))
我希望代表个人捐助者对谁接受 (recip_name
) 的选择,他们的捐款可能每个月都会发生变化(捐助者偏好),而 donor_ID
代表个人捐助者。由此产生的冲积图应显示每个月之间的变化,其方式也与接受者之间移动的总捐赠金额 (donation_amt
) 成正比。下面是我为完成此任务而编写的脚本:
df$recip_name <- as.factor(df$recip_name)
df %>%
filter(transaction_dt < as.Date("2020-01-01")) %>%
select(donor_ID, recip_name, donation_amt, month_year) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
执行此 R 代码后,这是我收到的结果错误:
Error in f(...) :
Data is not in a recognized alluvial form (see `help('alluvial-data')` for details).
我已经对正确设置用于 ggalluvial 的数据的问题进行了研究,但无济于事。如何使用这些数据正确绘制所需的冲积图?
目前,绘图图层抛出的错误比冲积结构测试本身抛出的错误信息量少。测试还使用不同的术语:id
代表 alluvium
,key
代表 x
,value
代表 stratum
。 (对此我深表歉意!这些将在未来的版本中更改。)您的数据试图采用 lodes(长)形式,并且 is_lodes_form()
测试(下方)表示存在重复的 id-axis 对。
我之前没有注意到,但确实存在至少一对重复:有两行donor_ID = 1
和month_year = September 2019
。冲积层要求每个冲积层 (id) 最多通过每个轴一次。在移除这个和另一个之后,冲积地块确实呈现(下图)。大概是因为这只是数据的样本,所以情节很稀疏。
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(stringr)
library(ggalluvial)
#> Loading required package: ggplot2
df <- structure(list(
donor_ID = c("1", "2", "3", "1", "2", "3", "1", "2", "3", "1"),
recip_name = c("B, P", "S, B", "K, A", "H, K", "W, E", "S, B", "C, J", "B, J", "W, E", "S, B"),
donation_amt = c(25, 27, 50, 100, 3, 9, 25, 50, 400, 20),
month_year = c("September 2019", "July 2019", "June 2019", "April 2019", "December 2019", "August 2019", "September 2019", "October 2019", "August 2019", "December 2019")
), class = "data.frame", row.names = c(NA, -10L))
df$recip_name <- as.factor(df$recip_name)
is_lodes_form(df, key = month_year, value = recip_name, id = donor_ID)
#> Duplicated id-axis pairings.
#> [1] FALSE
df %>%
slice(-c(7, 9)) %>%
mutate(month = match(str_remove(month_year, " 2019"), month.name)) %>%
ggplot(aes(x = month_year, y = donation_amt, stratum = recip_name,
alluvium = donor_ID, fill = recip_name, label = recip_name)) +
scale_fill_brewer(type = "qual", palette = "Set2") +
geom_flow(stat = "alluvium", color = "darkgray") +
geom_stratum() +
theme_light() +
theme(legend.position = "bottom") +
ggtitle("Donor Preference")
由 reprex package (v2.0.1)
于 2022-01-30 创建该图非常稀疏,可能是因为这只是您的数据样本。而且你必须做更多的事情来清理情节,例如将 character-valued month_year
转换为一个因素或日期。
如果您想区分同一捐赠者对不同接受者的捐赠,那么您可能想要使用的观察单位是 donor_ID
和 recip_name
的交互作用。将其传递给 alluvium
美学,将 recip_name
传递给 stratum
,将 donor_ID
传递给 fill
可能会产生您想要的情节。