从数据框中的数据绘制社交网络图

graphing a social network graph from data in dataframe

我有一个包含 9 个变量的 380 个观察值的数据框。数据代表从事类似项目的人员之间的合作。第一列是主节点,其他列代表s/he在一个项目上合作的人,每一列代表一个人。因此,如果第 1 行第 1 列的研究人员与五个人合作,他们的名字将出现在五列中,如果第 2 行第 1 列的研究人员与 3 个人合作,他们的名字将出现在其他三列中。显然会有很多空栏,因为并非所有研究人员都与相同数量的人合作。有了这些数据,如何将其绘制成网络图?

数据框示例:

data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

我尝试使用 graph.data.frame,但它只给出了前两列之间的联系。

我们可以尝试ggraph包,但是我们必须把数据整理好。

# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))

# here you load some nice package
library(tidyr)      # to tidy the data
library(ggraph)     # to plot nice network data with the semantic of ggplot
library(tidygraph)  # to work with networks
library(ggrepel)    # to not have overlapping labels

首先,您应该准备数据。由于您有父亲行 author_1 和儿子,您可以设法为 author_1author_n 的每个组合执行此操作,因为您应该只有一列。如果您没有分层数据集,它显然也可以工作。你应该为每一行都有双父子的所有组合,然后 rbind() 这样做,合并所有组合(做起来比解释容易)。

edges <-rbind(
expand(data, nesting(author_1,author_2))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3))  %>%  `colnames<-`(c("a", "b")),  # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4))  %>%  `colnames<-`(c("a", "b"))   # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
   a        b       
   <fct>    <fct>   
 1 Joan     Terrence
 2 John     Joan    
 3 Kerry    Rick    
 4 Michelle N/A     
 5 Paul     Collin  
 6 Joan     Joan    
 7 John     Terrence
 8 Kerry    Michelle
 9 Michelle Michelle
10 Paul     Paul    
11 Joan     N/A     
12 John     Michelle
13 Kerry    Collin  
14 Michelle N/A     
15 Paul     Phillips

请记住,如果您要绘制 N/A,请保持原样,另一方面,您在末尾添加此 %>% filter(b != 'N/A')

现在我们管理数据以将它们放入图表中:

# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))

# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
  group_by(researcher) %>%
  summarise(n = sum(n))

# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher) 
edges1$b <- match(edges1$b, nodes$researcher)

# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>% 
  activate(edges) %>% 
  arrange(desc(weight)
  ) 

# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +     
geom_node_point(aes(size=n)) +                          # size of the node the frequency
geom_edge_link(aes(width = weight),                     # here you set the edges
                                                        # thickness as frequency
               arrow = arrow(length = unit(4, 'mm')),   # arrows, if you want
               end_cap = circle(3, 'mm'), alpha = 0.8) + 
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher)) 

这应该与 data 和:

一致
> edges1
# A tibble: 14 x 3
# Groups:   a [?]
       a     b weight
   <int> <int>  <int>
 1     1     1      1
 2     1     7      1
 3     1     9      1
 4     2     1      1
 5     2     9      1
 6     2     4      1
 7     3     6      1
 8     3     8      1
 9     3     4      1
10     4     7      2
11     4     4      1
12     5     6      1
13     5     5      1
14     5    10      1
> nodes
# A tibble: 10 x 2
   researcher     n
   <fct>      <dbl>
 1 Joan           5
 2 John           3
 3 Kerry          3
 4 Michelle       6
 5 Paul           4
 6 Collin         2
 7 N/A            3
 8 Rick           1
 9 Terrence       2
10 Phillips       1