将逗号分隔的邻接列表转换为 2 列边缘列表(以构建 igraph 对象)

convert comma separated adjacency list into 2-column edgelist (to build igraph object)

我通过 Whosebug 的大量信息进行了大量搜索以找到解决方案,但我被卡住了!我正在通过阅读和实践来学习 R 和 igraph,如果问题太简单,请多多包涵 :)

我一直在使用下面的代码从 google 学者个人资料页面中提取共同作者的文本数据(邻接表),我想把它变成共同作者网络,但我没有在 Igraph 中使用 graph_from_adjlist 成功;它没有以正确的方式构建网络,所以我改变了我的方法并尝试先将它们变成边缘列表然后使用更常见的 graph_from_edgelist 功能,我找到了解决方案 here;当行数(在我的例子中是出版物)少于 300 时它工作正常,但超过它会在 R:

中给出这个错误
Error in rep(x[1], length(x) - 1) : invalid 'times' argument
Called from: FUN(X[[i]], ...)
Browse[1]> Q

老实说,我不知道将邻接列表的列转换为 2 列边缘列表的代码逻辑,我无法找出问题所在。

这是我的一小段代码(我在内联注释中描述了每个步骤):

library(scholar)
library(igraph) 
# one scholar profile link (works fine with small number of authors)
scurl <- "https://scholar.google.com/citations?user=nG42BMAAAAAJ&hl=en"
# prof Welman google scholar link as an example that gives the above error
# scurl <- "https://scholar.google.com/citations?user=_q2NODAAAAAJ&hl=en"
citid <- strsplit((strsplit(scurl,"&",fixed = TRUE)[[1]][1]),"=",fixed = TRUE)[[1]][2]
# authors <- as.data.frame(cSplit(subset(get_publications(citid,flush = TRUE),select = "author"),splitCols = "author",sep = ",")) ## this I put to check if authors are extracting in a right way
pub <- get_publications(citid,flush = TRUE)
coauthors <- as.character(tolower(pub$author)) ##to make text differences less effective in result
adjlist=strsplit(coauthors,",") # splits the character strings into list with different vector for each line
col1 <- unlist(lapply(adjlist,function(x) rep(x[1],length(x)-1))) # establish first column of edgelist by replicating the 1st element (=ID number) by the length of the line minus 1 (itself)
col2 <- unlist(lapply(adjlist,"[",-1)) # the second line I actually don't fully understand this command, but it takes the rest of the ID numbers in the character string and transposes it to list vertically
edgelist <- cbind(col1,col2) # creates the edgelist by combining column 1 and 2.
coauthorgraph <- graph_from_edgelist(edgelist,directed = FALSE)
set.seed(333)
coauthorgraph$layout <- layout.circle
tkplot(coauthorgraph)

我尝试在 col2 行中添加 (times=400) 条件,但没有帮助。 我将非常感谢听到任何建议。

一列是每个元素减去第一个元素,另一列是第一个元素重复向量的长度 - 1。你可以用 rep(..., times=lengths(adjlist)) - 1L 得到它。所以,在你得到 pub,

之后拿起
## tolower does character conversion, and remove the trailing "..."
coauthors <- sub('[ ,.]+$', '', tolower(pub$author))

## Make edgelist by repeating 1st elements each length(vector)-1L
adjlist <- strsplit(coauthors, '\s*,\s*')
edgelist <- cbind(
    unlist(lapply(adjlist, tail, -1L)),                        # col1
    rep(sapply(adjlist, `[`, 1L), times=lengths(adjlist)-1L)   # col2
)

## make graph
g <- graph_from_edgelist(edgelist, directed=FALSE)

## Offset labels a bit: nodes printed from +x-axis counter-clockwise
ord <- V(g)                                               # node order
theta <- seq(0, 2*pi-2*pi/length(ord), 2*pi/length(ord))  # angle
theta[theta>pi] <- -(2*pi - theta[theta>pi])              # convert to [0, pi]
dists <- rep(c(1, 0.7), length.out=length(ord))           # alternate distance

## Plot
plot(g, layout=layout.circle, vertex.label.degree=-theta, 
     vertex.label.dist=dists, vertex.label.cex=1.1,
     vertex.size=14, vertex.color='#FFFFCC', edge.color='#E25822')

更新

该错误是由于试图给出一个否定的 times 参数(即当列表中的一个元素不包含作者时)。

要add-self循环并删除新查询中出现的零字符条目,您需要首先过滤合作者列表以仅包含带字符的条目,然后重复那些长度==1的元素调整列表。其余的应该是一样的。

scurl <- "scholar.google.com/citations?user=xqefLxQAAAAJ&hl=en"
citid <- regmatches(scurl, gregexpr('(?<=user=)[[:alnum:]]+', scurl, perl=TRUE))
pub <- get_publications(citid, flush=TRUE)

## tolower does character conversion, and remove the trailing "..."
coauthors <- sub('[ ,.]+$', '', tolower(pub$author))
coauthors <- coauthors[nzchar(coauthors)]  # only keep entries that aren't blank

## Add self-loops for single-author entries
adjlist <- strsplit(coauthors, '\s*,\s*')
lens <- lengths(adjlist)
adjlist[lens==1L] <- lapply(adjlist[lens==1L], rep, times=2)  # repeat single author entries

然后像以前一样继续。