从数据框中的数据绘制社交网络图
graphing a social network graph from data in dataframe
我有一个包含 9 个变量的 380 个观察值的数据框。数据代表从事类似项目的人员之间的合作。第一列是主节点,其他列代表s/he在一个项目上合作的人,每一列代表一个人。因此,如果第 1 行第 1 列的研究人员与五个人合作,他们的名字将出现在五列中,如果第 2 行第 1 列的研究人员与 3 个人合作,他们的名字将出现在其他三列中。显然会有很多空栏,因为并非所有研究人员都与相同数量的人合作。有了这些数据,如何将其绘制成网络图?
数据框示例:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
我尝试使用 graph.data.frame,但它只给出了前两列之间的联系。
我们可以尝试ggraph
包,但是我们必须把数据整理好。
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
首先,您应该准备数据。由于您有父亲行 author_1
和儿子,您可以设法为 author_1
和 author_n
的每个组合执行此操作,因为您应该只有一列。如果您没有分层数据集,它显然也可以工作。你应该为每一行都有双父子的所有组合,然后 rbind()
这样做,合并所有组合(做起来比解释容易)。
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
请记住,如果您要绘制 N/A,请保持原样,另一方面,您在末尾添加此 %>% filter(b != 'N/A')
。
现在我们管理数据以将它们放入图表中:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))
这应该与 data
和:
一致
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1
我有一个包含 9 个变量的 380 个观察值的数据框。数据代表从事类似项目的人员之间的合作。第一列是主节点,其他列代表s/he在一个项目上合作的人,每一列代表一个人。因此,如果第 1 行第 1 列的研究人员与五个人合作,他们的名字将出现在五列中,如果第 2 行第 1 列的研究人员与 3 个人合作,他们的名字将出现在其他三列中。显然会有很多空栏,因为并非所有研究人员都与相同数量的人合作。有了这些数据,如何将其绘制成网络图?
数据框示例:
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
我尝试使用 graph.data.frame,但它只给出了前两列之间的联系。
我们可以尝试ggraph
包,但是我们必须把数据整理好。
# this are your data
data <- data.frame(
author_1 = c('John', 'Kerry', 'Michelle', 'Joan', 'Paul'),
author_2 = c('Joan', 'Rick', 'N/A', 'Terrence', 'Collin'),
author_3 = c('Terrence', 'Michelle', 'Michelle', 'Joan', 'Paul'),
author_4 = c('Michelle', 'Collin', 'N/A', 'N/A', 'Phillips'))
# here you load some nice package
library(tidyr) # to tidy the data
library(ggraph) # to plot nice network data with the semantic of ggplot
library(tidygraph) # to work with networks
library(ggrepel) # to not have overlapping labels
首先,您应该准备数据。由于您有父亲行 author_1
和儿子,您可以设法为 author_1
和 author_n
的每个组合执行此操作,因为您应该只有一列。如果您没有分层数据集,它显然也可以工作。你应该为每一行都有双父子的所有组合,然后 rbind()
这样做,合并所有组合(做起来比解释容易)。
edges <-rbind(
expand(data, nesting(author_1,author_2)) %>% `colnames<-`(c("a", "b")), # for 1 and 2, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_3)) %>% `colnames<-`(c("a", "b")), # for 1 and 3, we do all the combinations and give name a and b
expand(data, nesting(author_1,author_4)) %>% `colnames<-`(c("a", "b")) # for 1 and 3, we do all the combinations and give name a and b
)
edges
# A tibble: 15 x 2
a b
<fct> <fct>
1 Joan Terrence
2 John Joan
3 Kerry Rick
4 Michelle N/A
5 Paul Collin
6 Joan Joan
7 John Terrence
8 Kerry Michelle
9 Michelle Michelle
10 Paul Paul
11 Joan N/A
12 John Michelle
13 Kerry Collin
14 Michelle N/A
15 Paul Phillips
请记住,如果您要绘制 N/A,请保持原样,另一方面,您在末尾添加此 %>% filter(b != 'N/A')
。
现在我们管理数据以将它们放入图表中:
# create edges
edges1 <- edges%>% group_by(a,b) %>% summarise(weight = sum(n()))
# create nodes
nodes <- rbind(data.frame(researcher = edges$a, n = 1),data.frame(researcher = edges$b, n = 1)) %>%
group_by(researcher) %>%
summarise(n = sum(n))
# now we have to have the match between edges and nodes
edges1$a <- match(edges1$a, nodes$researcher)
edges1$b <- match(edges1$b, nodes$researcher)
# declare the data as graph data
tidy <- tbl_graph(nodes = nodes, edges = edges1, directed = T)
tidy <- tidy %>%
activate(edges) %>%
arrange(desc(weight)
)
# now the plot: you have several options to do, here a basic one
ggraph(tidy, layout = "gem") +
geom_node_point(aes(size=n)) + # size of the node the frequency
geom_edge_link(aes(width = weight), # here you set the edges
# thickness as frequency
arrow = arrow(length = unit(4, 'mm')), # arrows, if you want
end_cap = circle(3, 'mm'), alpha = 0.8) +
scale_edge_width(range = c(0.2, 2)) +
geom_text_repel(aes(x = x, y=y , label=researcher))
这应该与 data
和:
> edges1
# A tibble: 14 x 3
# Groups: a [?]
a b weight
<int> <int> <int>
1 1 1 1
2 1 7 1
3 1 9 1
4 2 1 1
5 2 9 1
6 2 4 1
7 3 6 1
8 3 8 1
9 3 4 1
10 4 7 2
11 4 4 1
12 5 6 1
13 5 5 1
14 5 10 1
> nodes
# A tibble: 10 x 2
researcher n
<fct> <dbl>
1 Joan 5
2 John 3
3 Kerry 3
4 Michelle 6
5 Paul 4
6 Collin 2
7 N/A 3
8 Rick 1
9 Terrence 2
10 Phillips 1