如何从数据中生成 R 中与 igraph 兼容的边集

How to generate an igraph-compatible edge set in R from data

我有一个数据集,当前包含一组单词以及它们最初所在的段落,如下所示:

word <- c("wind", "statement", "card", "growth", "egg", "caption", "statement", "robin", "growth")
paragraph <- c(1, 1, 1, 2, 2, 2, 3, 3, 3)
data <- data.frame(word, paragraph)

并且我正在尝试从中为 igraph 生成一个边缘列表,该列表根据每个单词在段落中的共现来连接每个单词,如下所示:

node1 <- c("wind", "wind", "statement", "statement", "card", "card", "growth", "growth", "egg", "egg", "caption", "caption", "statement", "statement", "robin", "robin", "growth", "growth")
node2 <- c("statement", "card", "wind", "card", "wind", "statement", "egg", "caption", "growth", "caption", "growth", "egg", "robin", "growth", "statement", "growth", "statement", "robin")
edges <- data.frame(node1, node2)

到目前为止,我只弄清楚了如何使用

计算基于段落的每个单词之间的相关性
data <- data %>% group_by(word) %>% pairwise_cor(word, paragraph, sort = TRUE)

来自 widyr 包,但对于其他操作,我想 运行 我真的需要边缘是共现的实际数量而不是相关系数。有谁知道是否有一些代码可以为我解决这个问题?任何帮助将不胜感激!!

当你说“我真的需要边缘是共现的实际数量而不是相关系数”时,我不太确定你的意思。但是,“我正在尝试从它生成一个 igraph 的边缘列表,该列表根据每个单词在段落中的共现来连接每个单词”似乎很清楚。我将其解释为如果两个词在同一段中,它们就会链接起来。您可以像这样使用 combn 制作这种边缘列表:

Edges = c()
for(p in unique(data$paragraph)) { 
    Edges = c(Edges, word[combn(which(data$paragraph == p), 2)]) }
EL = matrix(Edges, ncol=2, byrow=T)

library(igraph)

g = graph_from_edgelist(EL, directed=FALSE)
plot(g)