
How does sankeyNetwork set x axis position

我正在查看有关使用 in 通过 networkD3::sankeyNetwork() 构建桑基图的文档和教程。

我可以使用其他人的代码使它正常工作(来自这里: - 请参阅 CJ Yetman 的 networkd3 的 tidyverse 方法)

当我尝试自己实施时,我的节点在 x 轴上的排列顺序错误 - 导致流程无法理解。

但是我无法弄清楚 sankeyNetwork 在哪里获取有关 x 轴位置的信息。



#Create the data
df <- data.frame('one' = c('a', 'b', 'b', 'a'), 
                 'two' = c('c', 'd', 'e', 'c'), 
                 'three' = c('f', 'g', 'f', 'f'))

#My code
#Create the links
links <- df %>%
  mutate(row = row_number()) %>% #Get row for grouping and pivoting
  pivot_longer(-row) %>% #pivot to long format
  group_by(row) %>% 
  mutate(source_c = lead(value)) %>% #Get flow 
  filter(! %>% #Get rid of NA
  rename(target_c = value) %>% #Correct names
  group_by(target_c, source_c) %>% #Count frequencies
  summarize(value = n()) %>%
  ungroup() %>%
  mutate(target = as.integer(factor(target_c)), #Convert to numeric values
         source = as.integer(factor(source_c))) %>%
  mutate(source = source - 1, #zero index
         target = target - 1) %>%

#create the nodes
nodes <- data.frame(name = factor(unique(c(links$target_c, links$source_c))))

#plot the network
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
              Target = 'target', Value = 'value', NodeID = 'name')



links <-
  df %>% 
  mutate(row = row_number()) %>%  # add a row id
  gather('col', 'source', -row) %>%  # gather all columns
  mutate(col = match(col, names(df))) %>%  # convert col names to col nums
  mutate(source = paste0(source, '_', col)) %>%  # add col num to node names
  group_by(row) %>%
  arrange(col) %>%
  mutate(target = lead(source)) %>%  # get target from following node in row
  ungroup() %>% 
  filter(! %>%  # remove links from last column in original data
  select(source, target) %>% 
  group_by(source, target) %>% 
  summarise(value = n())  # aggregate and count similar links

# create nodes data frame from unque nodes found in links data frame
nodes <- data.frame(id = unique(c(links$source, links$target)),
                    stringsAsFactors = FALSE)
# remove column id from names
nodes$name <- sub('_[0-9]*$', '', nodes$id)

# set links data to the 0-based index of the nodes in the nodes data frame
links$source <- match(links$source, nodes$id) - 1
links$target <- match(links$target, nodes$id) - 1

sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
              Target = 'target', Value = 'value', NodeID = 'name')


我知道工作代码和我的代码不同,但我看不到 sankeyNetwork 调用行号(即 x 轴)数据的位置 - 没有调用包含的任何变量该信息。我想我可以让我自己的代码来准备数据,一旦我知道它需要是什么样子。

中的所有函数一样,sankeyNetwork() 根据节点与网络中其他节点的关系,通过算法确定节点的 xy 位置,它不会直接从数据中读取 xy 值。


查看您的 links 数据框并与您开始使用的 df 数据框进行比较。例如,links 数据框中的第一个 row/link 是 a->c,但您的 targetsource 列将其标识为 0->0。同样,第二个 row/link 是 b->d,但您的 targetsource 列将其标识为 1->1。等等...

#   target_c source_c value target source
# 1        a        c     2      0      0
# 2        b        d     1      1      1
# 3        b        e     1      1      2
# 4        c        f     2      2      3
# 5        d        g     1      3      4
# 6        e        f     1      4      3

另外,因为你使用 mutate(source_c = lead(value)) 而不是你复制的其他代码中的 mutate(target = lead(source)),你反转了你的链接流,所以你会得到你所复制的内容的镜像期待。

如果您必须在 dplyr 链内的链接数据框中设置目标和源节点 ID,并像那样改变命令,您可以将 factor 命令的级别设置为相同的内容,将两者中的所有唯一值组合起来列,例如(但您仍然必须颠倒源与目标的概念才能获得与复制代码相同的结果)...


#Create the data
df <- data.frame('one' = c('a', 'b', 'b', 'a'), 
                 'two' = c('c', 'd', 'e', 'c'), 
                 'three' = c('f', 'g', 'f', 'f'))

#My code
#Create the links
links <- 
  df %>%
  mutate(row = row_number()) %>% #Get row for grouping and pivoting
  pivot_longer(-row) %>% #pivot to long format
  group_by(row) %>% 
  mutate(source_c = lead(value)) %>% #Get flow 
  filter(! %>% #Get rid of NA
  rename(target_c = value) %>% #Correct names
  group_by(target_c, source_c) %>% #Count frequencies
  summarize(value = n()) %>%
  ungroup() %>%
  mutate(target = as.integer(factor(target_c, level = unique(c(target_c, source_c)))), #Convert to numeric values
         source = as.integer(factor(source_c, level = unique(c(target_c, source_c))))) %>%
  mutate(source = source - 1, #zero index
         target = target - 1) %>%

#create the nodes
nodes <- data.frame(name = factor(unique(c(links$target_c, links$source_c))))

#plot the network
sankeyNetwork(Links = links, Nodes = nodes, Source = 'source',
              Target = 'target', Value = 'value', NodeID = 'name')