如果我只有边名称,如何创建网络?
How to create a network if I only have the edges names?
我正在尝试连接在同一过程中被引用的作者。我的节点是作者,边缘是进程,但我不知道如何创建边缘列表。
我现在拥有的('Doutrina'表示作者,'Numero'表示进程号):
我想要这样的东西(这里'N'表示这个连接发生了多少次,即它们被一起引用了多少次):
示例数据:
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df
#> # A tibble: 12 x 2
#> Doutrina Numero
#> <chr> <chr>
#> 1 MILARE, 2014 1009526-53.2015.8.26.0032
#> 2 SEGUIN, 2000 0054387-89.2011.8.26.0224
#> 3 SILVA, 2009 0054387-89.2011.8.26.0224
#> 4 MILARE, 2015 0000351-14.2013.8.26.0326
#> 5 SILVA, 2011 0000351-14.2013.8.26.0326
#> 6 MAXIMILIANO, 1961 0000351-14.2013.8.26.0326
#> 7 SILVA, 2009 0000431-26.2013.8.26.0698
#> 8 SEGUIN, 2000 0000431-26.2013.8.26.0698
#> 9 SILVA, 2009 0054391-29.2011.8.26.0224
#> 10 SEGUIN, 2000 0054391-29.2011.8.26.0224
#> 11 MAXIMILIANO, 2015 0012360-28.2010.8.26.0224
#> 12 MILARE, 2015 0012360-28.2010.8.26.0224
我修改了您的示例数据,这样结果会更有趣。
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df %>%
mutate(Doutrina = sub(", [0-9]{4}", "", Doutrina)) %>% # remove the year
full_join(x = ., y = ., by = "Numero") %>% # join data to itself by Numero
select(Doutrina = Doutrina.x, Doutrina2 = Doutrina.y) %>% # keep only name columns
filter(Doutrina != Doutrina2) %>% # remove self-reference rows
filter(Doutrina < Doutrina2) %>% # only keep rows for one diretion of edge/link
group_by(Doutrina, Doutrina2) %>%
summarise(N = n(), .groups = "drop")
#> # A tibble: 4 x 3
#> Doutrina Doutrina2 N
#> <chr> <chr> <int>
#> 1 MAXIMILIANO MILARE 2
#> 2 MAXIMILIANO SILVA 1
#> 3 MILARE SILVA 1
#> 4 SEGUIN SILVA 3
我正在尝试连接在同一过程中被引用的作者。我的节点是作者,边缘是进程,但我不知道如何创建边缘列表。
我现在拥有的('Doutrina'表示作者,'Numero'表示进程号):
我想要这样的东西(这里'N'表示这个连接发生了多少次,即它们被一起引用了多少次):
示例数据:
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df
#> # A tibble: 12 x 2
#> Doutrina Numero
#> <chr> <chr>
#> 1 MILARE, 2014 1009526-53.2015.8.26.0032
#> 2 SEGUIN, 2000 0054387-89.2011.8.26.0224
#> 3 SILVA, 2009 0054387-89.2011.8.26.0224
#> 4 MILARE, 2015 0000351-14.2013.8.26.0326
#> 5 SILVA, 2011 0000351-14.2013.8.26.0326
#> 6 MAXIMILIANO, 1961 0000351-14.2013.8.26.0326
#> 7 SILVA, 2009 0000431-26.2013.8.26.0698
#> 8 SEGUIN, 2000 0000431-26.2013.8.26.0698
#> 9 SILVA, 2009 0054391-29.2011.8.26.0224
#> 10 SEGUIN, 2000 0054391-29.2011.8.26.0224
#> 11 MAXIMILIANO, 2015 0012360-28.2010.8.26.0224
#> 12 MILARE, 2015 0012360-28.2010.8.26.0224
我修改了您的示例数据,这样结果会更有趣。
library(dplyr)
df <- tribble(
~Doutrina, ~Numero,
"MILARE, 2014", "1009526-53.2015.8.26.0032",
"SEGUIN, 2000", "0054387-89.2011.8.26.0224",
"SILVA, 2009", "0054387-89.2011.8.26.0224",
"MILARE, 2015", "0000351-14.2013.8.26.0326",
"SILVA, 2011", "0000351-14.2013.8.26.0326",
"MAXIMILIANO, 1961", "0000351-14.2013.8.26.0326",
"SILVA, 2009", "0000431-26.2013.8.26.0698",
"SEGUIN, 2000", "0000431-26.2013.8.26.0698",
"SILVA, 2009", "0054391-29.2011.8.26.0224",
"SEGUIN, 2000", "0054391-29.2011.8.26.0224",
"MAXIMILIANO, 2015", "0012360-28.2010.8.26.0224",
"MILARE, 2015", "0012360-28.2010.8.26.0224"
)
df %>%
mutate(Doutrina = sub(", [0-9]{4}", "", Doutrina)) %>% # remove the year
full_join(x = ., y = ., by = "Numero") %>% # join data to itself by Numero
select(Doutrina = Doutrina.x, Doutrina2 = Doutrina.y) %>% # keep only name columns
filter(Doutrina != Doutrina2) %>% # remove self-reference rows
filter(Doutrina < Doutrina2) %>% # only keep rows for one diretion of edge/link
group_by(Doutrina, Doutrina2) %>%
summarise(N = n(), .groups = "drop")
#> # A tibble: 4 x 3
#> Doutrina Doutrina2 N
#> <chr> <chr> <int>
#> 1 MAXIMILIANO MILARE 2
#> 2 MAXIMILIANO SILVA 1
#> 3 MILARE SILVA 1
#> 4 SEGUIN SILVA 3