在 R 中创建邻接矩阵

Question

我已经阅读了很多关于在 R 中创建邻接矩阵的内容。但是我有一个稍微奇怪的要求，我无法破解数据结构。

我有一个看起来像这样的数据集。

Case	Person	Status
ABC01	99999	Plaintiff
ABC01	11111	Defendant
ABC02	22222	Plaintiff
ABC02	99999	Defendant
ABC03	33333	Plaintiff
ABC03	44444	Defendant
ABC04	55555	Plaintiff
ABC04	66666	Defendant
ABC05	99999	Plaintiff
ABC05	88888	Defendant
ABC06	77777	Plaintiff
ABC06	22222	Defendant
ABC07	11111	Plaintiff
ABC07	44444	Defendant
ABC08	44444	Plaintiff
ABC08	99999	Defendant

希望这些专栏是不言自明的。邻接矩阵的输出应该看起来像这样，其中对于每个案例（从最终 table 中删除）应该输出参与每个角色的每一方的唯一计数。这将允许对原告和被告进行网络分析。

Plaintiff	Defendant	Count
99999	11111	1
22222	99999	1
33333	44444	1
55555	66666	1
99999	88888	1
77777	22222	1
11111	44444	1

请注意，随着原告和被告角色的转换，他们的索引会重复。但是，如果同一方与同一方之间存在多起诉讼，则应反映在事件计数中。

这是当前的解决方案

df %>% 
group_by(Case,Person) %>% 
mutate(count = n()) %>%
ungroup() %>%
mutate(row=row_number())%>%
spread(Status,count)

解决方案的问题是行偏移。

Crime Reference Number	Person Record URN (ACN)	row	Plantiff	Defendant
ACB01	8645499	1610	1	NA
ACB02	8620113	1456	NA	1
ACB02	8708027	1457	1	NA
ACB03	8667531	1455	1	NA
ACB04	8650244	1458	1	NA
ACB05	8613947	1214	1	NA
ACB06	9074764	1022	1	NA
ACB07	8949458	1459	1	NA

将不胜感激。

Answer 1

我们可以为 Case 分配一个 id 列并获取宽格式的数据。然后用count统计Plaintiff和Defendant的组合出现了多少次

library(dplyr)
library(tidyr)

df %>%
  mutate(Case = dense_rank(Case)) %>%
  pivot_wider(names_from = Status, values_from = Person) %>%
  count(Plaintiff, Defendant)

#  Plaintiff Defendant     n
#      <int>     <int> <int>
#1     11111     44444     1
#2     22222     99999     1
#3     33333     44444     1
#4     44444     99999     1
#5     55555     66666     1
#6     77777     22222     1
#7     99999     11111     1
#8     99999     88888     1

Answer 2

我们可以使用data.table

library(data.table)
dcast(setDT(df), frank(Case, ties.method = 'dense') ~ Status,
     value.var = 'Person')[, .(n = .N), .(Plaintiff, Defendant)]

-输出

  Plaintiff Defendant n
1:     99999     11111 1
2:     22222     99999 1
3:     33333     44444 1
4:     55555     66666 1
5:     99999     88888 1
6:     77777     22222 1
7:     11111     44444 1
8:     44444     99999 1

数据

df <- structure(list(Case = c("ABC01", "ABC01", "ABC02", "ABC02", "ABC03", 
"ABC03", "ABC04", "ABC04", "ABC05", "ABC05", "ABC06", "ABC06", 
"ABC07", "ABC07", "ABC08", "ABC08"), Person = c(99999L, 11111L, 
22222L, 99999L, 33333L, 44444L, 55555L, 66666L, 99999L, 88888L, 
77777L, 22222L, 11111L, 44444L, 44444L, 99999L), Status = c("Plaintiff", 
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant", 
"Plaintiff", "Defendant", "Plaintiff", "Defendant", "Plaintiff", 
"Defendant", "Plaintiff", "Defendant", "Plaintiff", "Defendant"
)), class = "data.frame", row.names = c(NA, -16L))

在 R 中创建邻接矩阵

Creating an Adjacency Matrix in R

r

matrix

network-analysis

数据