如何从计算 R 中出现次数的单列创建对？

Question

所以我正在为基于 IMDb 数据的社交网络分析创建边缘文件。我运行遇到了一个问题，我不知道如何解决它，因为我是 R 的新手。

假设我有以下数据框：

movieID <- c('A', 'A','A', 'B','B', 'C','C', 'C')
crewID <- c('Z', 'Y', 'X', 'Z','V','V', 'X', 'Y')
rating <- c('7.3','7.3', '7.3', '2.1', '2.1', '9.0','9.0', '9.0')
df <- data.frame(movieID, crewID, rating)

movieID	CrewID	Rating
A	Z	7.3
A	Y	7.3
A	X	7.3
B	Z	2.1
B	V	2.1
C	V	9.0
C	X	9.0
C	Y	9.0

我正在尝试在一部电影中构建唯一的 CrewID 对，其权重等于该对的出现次数，这意味着这两个剧组成员一起制作电影的频率。所以基本上我想要一个像下面这样的数据框：

CrewID1	CrewID2	weight	(not a col but explanation)
Z	Y	1	together once in movie A
Z	X	1	together once in movie A
Y	X	2	together twice in movies A and C
Z	V	1	together once in movie B
V	X	1	together once in movie C
V	Y	1	together once in movie C

(Z,Y) 和 (Y,Z) 对彼此相等，因为我不关心方向。

我在类似问题上发现了以下 Whosebug 线程：

但是在我的例子中，这跳过了 (V,Y) 和 (X,Z) 的组合，并且 (X,Y) 的计数仍然是 1，我不知道如何解决它。

Answer 1

m <- crossprod(table(df[-3]))
m[upper.tri(m, diag = TRUE)] <-0
subset(as.data.frame.table(m), Freq > 0)

   CrewID CrewID.1 Freq
2       X        V    1
3       Y        V    1
4       Z        V    1
7       Y        X    2
8       Z        X    1
12      Z        Y    1

Answer 2

也许不是最有效的解决方案，但这是一种解决方法：

# Define a function that generates pairs of ids
make_pairs <- function(data){
# Extract all ids in the movie
data$crew %>% 
    # Organize them alphabetically
    sort() %>% 
    # Generate all unique pairs
    combn(2) %>% 
    # Prep for map
    as.data.frame() %>% 
    # Generate pairs as single string
    purrr::map_chr(str_flatten, '_')
}
# Generate the data
tibble::tibble(
movie = c('A', 'A', 'A', 'B','B', "C", 'C', 'C'),
crew = c('Z', 'Y', 'X', 'Z', 'V', 'V', 'X', 'Y')
) %>% 
    # Nest the data so all ids in one movie gets put together
    tidyr::nest(data = -movie) %>%
    # Generate pairs of interactions
        dplyr::mutate(
        pairs = purrr::map(data, make_pairs)
    ) %>% 
    # Expand all pairs
    tidyr::unnest(cols = pairs) %>% 
    # Separate them into unique colums
    tidyr::separate(pairs, c('id1', 'id2')) %>% 
    # Count the number of times two ids co-occure
    dplyr::count(id1, id2)

# A tibble: 6 x 3
  id1   id2       n
  <chr> <chr> <int>
1 V     X         1
2 V     Y         1
3 V     Z         1
4 X     Y         2
5 X     Z         1
6 Y     Z         1

如何从计算 R 中出现次数的单列创建对？

How to create pairs from a single column counting the occurrence in R?

r

network-analysis

edge-list