如何使用冲积图(或桑基图)使用 R 显示类别随时间的变化
How to use an Alluvial Plot (or Sankey diagram) to show change of categories over time using R
我正在尝试使用冲积图(桑基图)来显示不同类别在两个时间段内的变化。当所有因子水平都在两个时间段(pre 和 post)中表示时,我能够创建一个对我有意义的图,但是根据我的数据,更改因子顺序后图看起来很奇怪。我还想为两个时间段的类别显示相同的填充颜色,但只能更改第一个时间段(pre)。
当我绘制绘图时,我注意到我为每个因子水平指定的颜色不是我想要的颜色,尽管框/层的顺序是正确的。
任何关于如何改进情节的帮助或建议,以及当类别在两个时间段内都没有完全代表时,我如何克服对两个组的因子水平进行排序的问题的任何帮助或建议都会非常有帮助。
代码如下:
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
head(db)
# Order factor levels
levels(db$pre) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
levels(db$post) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
# Set colors for the plot
colors.p <- c("#282828", "#7C7C7C", "#20AB5F", "#3EFF00",
"#FBFF00", "#FFBD00", "#FF0C00")
# Plot
p <- ggplot(db,
aes(y = freq, axis1 = pre,
axis2 = post)) +
geom_alluvium(aes(fill = pre), show.legend = FALSE) +
geom_stratum(aes(fill = pre), color = "black", alpha = 0.5) +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("previous", "current"),
expand = c(0.3, 0.01)) +
scale_fill_manual(values = colors.p) +
theme_void() +
theme(
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 15, face = "bold"),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "none"
)
p
我用我更熟悉的不同包试了一下 (ggsankey)。我还从每个时间点中删除了一个类别,以说明因子重新排序并且这是可能的。
这能解决您的问题吗?如果没有,请说明您还缺少什么。
library(tidyverse)
library(ggsankey)
db <- data.frame(pre = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), each = 6),
post = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), times = 6),
freq = rep(sample(seq(0:20), 6), 6))
db %>%
uncount(freq) %>%
filter(pre != "DD", post != "NT") %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node, "LC", "NT", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00"))
编辑:对于您的新数据,我之前发布的方法仍然有效。您需要在前时间点的因子重新调整中添加附加级别(“NE”)并作为新颜色(在此示例中为蓝色)。您对这些数据有什么错误?
library(tidyverse)
library(ggsankey)
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
db %>%
uncount(freq) %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node,"DD", "LC", "NT","NE", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "NT", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00", "NE" ="blue"))
我正在尝试使用冲积图(桑基图)来显示不同类别在两个时间段内的变化。当所有因子水平都在两个时间段(pre 和 post)中表示时,我能够创建一个对我有意义的图,但是根据我的数据,更改因子顺序后图看起来很奇怪。我还想为两个时间段的类别显示相同的填充颜色,但只能更改第一个时间段(pre)。 当我绘制绘图时,我注意到我为每个因子水平指定的颜色不是我想要的颜色,尽管框/层的顺序是正确的。
任何关于如何改进情节的帮助或建议,以及当类别在两个时间段内都没有完全代表时,我如何克服对两个组的因子水平进行排序的问题的任何帮助或建议都会非常有帮助。
代码如下:
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
head(db)
# Order factor levels
levels(db$pre) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
levels(db$post) <- c("NE", "DD", "LC", "NT", "VU", "EN", "CR")
# Set colors for the plot
colors.p <- c("#282828", "#7C7C7C", "#20AB5F", "#3EFF00",
"#FBFF00", "#FFBD00", "#FF0C00")
# Plot
p <- ggplot(db,
aes(y = freq, axis1 = pre,
axis2 = post)) +
geom_alluvium(aes(fill = pre), show.legend = FALSE) +
geom_stratum(aes(fill = pre), color = "black", alpha = 0.5) +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("previous", "current"),
expand = c(0.3, 0.01)) +
scale_fill_manual(values = colors.p) +
theme_void() +
theme(
panel.background = element_blank(),
axis.text.y = element_blank(),
axis.text.x = element_text(size = 15, face = "bold"),
axis.title = element_blank(),
axis.ticks = element_blank(),
legend.position = "none"
)
p
我用我更熟悉的不同包试了一下 (ggsankey)。我还从每个时间点中删除了一个类别,以说明因子重新排序并且这是可能的。 这能解决您的问题吗?如果没有,请说明您还缺少什么。
library(tidyverse)
library(ggsankey)
db <- data.frame(pre = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), each = 6),
post = rep(c("DD", "LC", "NT",
"VU", "EN", "CR"), times = 6),
freq = rep(sample(seq(0:20), 6), 6))
db %>%
uncount(freq) %>%
filter(pre != "DD", post != "NT") %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node, "LC", "NT", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00"))
编辑:对于您的新数据,我之前发布的方法仍然有效。您需要在前时间点的因子重新调整中添加附加级别(“NE”)并作为新颜色(在此示例中为蓝色)。您对这些数据有什么错误?
library(tidyverse)
library(ggsankey)
db <- read.table(text = "pre post freq
NE NE 0
NE DD 2
NE LC 5
NE NT 2
NE VU 3
NE EN 5
NE CR 1
DD NE 0
DD DD 3
DD LC 37
DD NT 10
DD VU 14
DD EN 3
DD CR 3
LC NE 0
LC DD 0
LC LC 18
LC NT 2
LC VU 1
LC EN 2
LC CR 0
NT NE 0
NT DD 1
NT LC 3
NT NT 8
NT VU 13
NT EN 5
NT CR 1
VU NE 0
VU DD 0
VU LC 1
VU NT 0
VU VU 7
VU EN 8
VU CR 3
EN NE 0
EN DD 0
EN LC 0
EN NT 0
EN VU 0
EN EN 0
EN CR 2
CR NE 0
CR DD 0
CR LC 1
CR NT 0
CR VU 0
CR EN 0
CR CR 2
", header=T)
db %>%
uncount(freq) %>%
make_long(pre, post) %>%
mutate(node = fct_relevel(node,"DD", "LC", "NT","NE", "VU", "EN", "CR"),
next_node = fct_relevel(next_node, "DD", "LC", "NT", "VU", "EN", "CR")) %>%
ggplot(aes(x = x,
next_x = next_x,
node = node,
next_node = next_node,
fill = factor(node))) +
geom_alluvial() +
scale_fill_manual(values = c("DD" = "#7C7C7C", "LC" = "#20AB5F", "NT" = "#3EFF00", "VU" = "#FBFF00", "EN" = "#FFBD00", "CR" = "#FF0C00", "NE" ="blue"))