如何从 data.frame 制作一个 googleVis 多 Sankey?
How to make a googleVis multiple Sankey from a data.frame?
瞄准
我的目标是使用 googleVis
包在 R 中制作多个 Sankey。输出应与此类似:
数据
我在 R 中创建了一些虚拟数据:
set.seed(1)
source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation
dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)
到目前为止我尝试了什么
如果我只有一个源和目标,但没有中间点,我可以用 2 个变量构建一个 Sankey:
aggdat <- aggregate(dummy~source+destination,dat,sum)
library(googleVis)
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)
代码产生这个:
问题
如何修改
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
也要接受 mid
变量吗?
函数gvisSankey
确实直接接受中层。这些级别必须在基础数据中编码。
source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T)
dummy <- rep(1,100) # For aggregation
现在,我们将重塑原始数据:
library(dplyr)
datSM <- dat %>%
group_by(source, mid) %>%
summarise(toMid = sum(dummy) ) %>%
ungroup()
数据框datSM
总结了从源到中的单位数。
datMD <- dat %>%
group_by(mid, destination) %>%
summarise(toDes = sum(dummy) ) %>%
ungroup()
数据框 datMD
总结了从 Mid 到 Destination 的单元数。该数据框将添加到最终数据框中。数据框需要 ungroup
并且具有相同的 colnames
.
colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")
由于datMD
被添加到最后一个,gvisSankey
会自动识别中间的步骤。
datVis <- rbind(datSM, datMD)
p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
plot(p)
剧情如下:
瞄准
我的目标是使用 googleVis
包在 R 中制作多个 Sankey。输出应与此类似:
数据
我在 R 中创建了一些虚拟数据:
set.seed(1)
source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation
dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)
到目前为止我尝试了什么
如果我只有一个源和目标,但没有中间点,我可以用 2 个变量构建一个 Sankey:
aggdat <- aggregate(dummy~source+destination,dat,sum)
library(googleVis)
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)
代码产生这个:
问题
如何修改
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
也要接受 mid
变量吗?
函数gvisSankey
确实直接接受中层。这些级别必须在基础数据中编码。
source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T)
dummy <- rep(1,100) # For aggregation
现在,我们将重塑原始数据:
library(dplyr)
datSM <- dat %>%
group_by(source, mid) %>%
summarise(toMid = sum(dummy) ) %>%
ungroup()
数据框datSM
总结了从源到中的单位数。
datMD <- dat %>%
group_by(mid, destination) %>%
summarise(toDes = sum(dummy) ) %>%
ungroup()
数据框 datMD
总结了从 Mid 到 Destination 的单元数。该数据框将添加到最终数据框中。数据框需要 ungroup
并且具有相同的 colnames
.
colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")
由于datMD
被添加到最后一个,gvisSankey
会自动识别中间的步骤。
datVis <- rbind(datSM, datMD)
p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
plot(p)
剧情如下: