将数据框转换为边缘列表

Question

我正在尝试从 R 中的一些 presence/absence 数据制作邻接矩阵或边缘列表。我有一个非常大的数据框（196 个变量的 ~12k obs）看起来有点像这样：

test_input<-data.frame(sample_ID=c("samp1","samp2","samp3","samp4","samp5","samp6","samp7"),
                       sp1 = c(1,0,0,1,1,0,1),
                       sp2 = c(1,0,0,1,1,1,1),
                       sp3 = c(0,1,1,0,0,0,1),
                       sp4 = c(0,1,1,0,0,1,0), stringsAsFactors = FALSE)
> test_input
  sample_ID sp1 sp2 sp3 sp4
1     samp1   1   1   0   0
2     samp2   0   0   1   1
3     samp3   0   0   1   1
4     samp4   1   1   0   0
5     samp5   1   1   0   0
6     samp6   0   1   0   1
7     samp7   1   1   1   0

我的目标是得到这样的东西：

> test_output
  col1 col2 freq
1  sp1  sp2    4
2  sp3  sp4    2
3  sp2  sp4    1
4  sp1  sp3    1
5  sp2  sp3    1

我见过一些嵌套的 for 循环方法 like the one here 但对于我拥有的数据帧，这些方法非常慢（days/weeks 到运行）并生成每个可能的数据帧 presence/absence 每个样本。

有什么建议可以解决这个问题吗？最好以 vectorised/tidyverse 类型的方式。

谢谢！

Answer 1

你可以用combn试试这个方法；对所有 sp 列进行 2 次组合并计算它们的内积，从而得出共现频率：

names <- combn(names(test_input[-1]), 2)
freq <- combn(test_input[-1], 2, function(x) sum(x[1] * x[2]))

data.frame(col1 = names[1,], col2 = names[2,], freq = freq)

#  col1 col2 freq
#1  sp1  sp2    4
#2  sp1  sp3    1
#3  sp1  sp4    0
#4  sp2  sp3    1
#5  sp2  sp4    1
#6  sp3  sp4    2

_{注意：这包括一起出现零次的对，如果您不需要它们，请将它们过滤掉。}

将数据框转换为边缘列表

Converting a dataframe to an edgelist

r

igraph