具有 2 个不同来源但 converging/shared 变量 [R] 的冲积地块
Alluvial plot with 2 different sources but a converging/shared variable [R]
我有使用 ggalluvial
包制作冲积地块的经验。但是,我 运行 遇到了一个问题,我正在尝试创建一个冲积地块,其中包含两个不同的源并收敛到 1 个变量。
这里是示例数据
library(dplyr)
library(ggplot2)
library(ggalluvial)
data <- data.frame(
unique_alluvium_entires = seq(1:10),
label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)
这是我用来制作情节的代码
#prep the data
data <- data %>%
group_by(shared_label) %>%
mutate(freq = n())
data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
(显然我还不能嵌入图片)
如您所见,我可以删除 NA
值,但 shared_label
没有正确“堆叠”。在 shared_label
列中,每个唯一的行都应相互堆叠。这也将解决尺寸问题,使它们沿 y 轴的尺寸相等。
有什么办法解决这个问题吗?我试过 ggsankey
但同样的问题出现了,我无法删除 NA
值。非常感谢任何提示!
此图是“流”统计变换的预期结果,这是“流”图形对象的默认值。 (也就是说,geom_flow()
= geom_flow(stat = "flow")
。)看起来您想要的是改为指定“冲积层”统计变换。下面我使用了您的所有代码,但仅复制和编辑了 ggplot()
调用。
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow(stat = "alluvium") + # <-- specify alternate stat
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
#> Warning: Removed 2 rows containing missing values (geom_text).
由 reprex package (v2.0.1)
于 2021-12-10 创建
我有使用 ggalluvial
包制作冲积地块的经验。但是,我 运行 遇到了一个问题,我正在尝试创建一个冲积地块,其中包含两个不同的源并收敛到 1 个变量。
这里是示例数据
library(dplyr)
library(ggplot2)
library(ggalluvial)
data <- data.frame(
unique_alluvium_entires = seq(1:10),
label_1 = c("A", "B", "C", "D", "E", rep(NA, 5)),
label_2 = c(rep(NA, 5), "F", "G", "H", "I", "J"),
shared_label = c("a", "b", "c", "c", "c", "c", "c", "a", "a", "b")
)
这是我用来制作情节的代码
#prep the data
data <- data %>%
group_by(shared_label) %>%
mutate(freq = n())
data <- reshape2::melt(data, id.vars = c("unique_alluvium_entires", "freq"))
data$variable <- factor(data$variable, levels = c("label_1", "shared_label", "label_2"))
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow() +
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
如您所见,我可以删除 NA
值,但 shared_label
没有正确“堆叠”。在 shared_label
列中,每个唯一的行都应相互堆叠。这也将解决尺寸问题,使它们沿 y 轴的尺寸相等。
有什么办法解决这个问题吗?我试过 ggsankey
但同样的问题出现了,我无法删除 NA
值。非常感谢任何提示!
此图是“流”统计变换的预期结果,这是“流”图形对象的默认值。 (也就是说,geom_flow()
= geom_flow(stat = "flow")
。)看起来您想要的是改为指定“冲积层”统计变换。下面我使用了您的所有代码,但仅复制和编辑了 ggplot()
调用。
#ggplot
ggplot(data,
aes(x = variable, stratum = value, alluvium = unique_alluvium_entires,
y = freq, fill = value, label = value)) +
scale_x_discrete(expand = c(.1, .1)) +
geom_flow(stat = "alluvium") + # <-- specify alternate stat
geom_stratum(color = "grey", width = 1/4, na.rm = TRUE) +
geom_text(stat = "stratum", size = 4) +
theme_void() +
theme(
axis.text.x = element_text(size = 12, face = "bold")
)
#> Warning: Removed 2 rows containing missing values (geom_text).
由 reprex package (v2.0.1)
于 2021-12-10 创建