在 R 中构建网络
Building network in R
我有一个如下所示的 csv 文件:
"","people_id","commit_id"
"1",1,0
"2",1,117
"3",1,144
"4",1,278
…
Here's 如果您想查看 csv 文件。它包含 11735 行,但包含 5923 个唯一的人员 ID。
有谁知道如何将人员 ID 与常见的 "commit_id" 联系起来并忽略 commit_id 0,因为 ID 0 不存在。
目前我已经这样做了:
# read the csv file
commitsNetwork <- read.csv("commits.csv", header=TRUE)
# use a subset for demo purpose
commitsNetwork <- commitsNetwork[c("people_id", "commit_id")]
#build edgelist(for commits)
C <- spMatrix(nrow = length(unique(commitsNetwork$people_id)),
ncol = length(unique(commitsNetwork$commit_id)),
i = as.numeric(factor(commitsNetwork$people_id)),
j = as.numeric(factor(commitsNetwork$commit_id)),
x = rep(1, length(as.numeric(commitsNetwork$people_id))) )
row.names(C) <- levels(factor(commitsNetwork$people_id))
colnames(C) <- levels(factor(commitsNetwork$commit_id))
adjC <- tcrossprod(C)
comG <- graph.adjacency(adjC, mode = "undirected", weighted = TRUE, diag = FALSE)
#write to pajek file
write.graph(comG, "comNetwork.net", format = "pajek")
此外,边缘来自第 2 列 "commit_id"。如果两个顶点(人)都由第 6 列的公共 commit_id 连接。
因此我不确定如何在 R 中使用此 csv 文件生成网络。
理想的输出应该是这样的:
*顶点数 5923
1
2
3
4
...
*边
1 4 1
1 25 1
1 39 1
1 41 1
1 48 1
直到 5923...
也许你想要这样的东西:
library(igraph)
library(Matrix)
download.file("https://www.dropbox.com/s/q7sxfwjec97qzcy/people.csv?dl=1",
tf <- tempfile(fileext = ".csv"), mode = "wb")
people <- read.csv(tf)
A <- spMatrix(nrow = length(unique(people$people)),
ncol = length(unique(people$repository_id)),
i = as.numeric(factor(people$people)),
j = as.numeric(factor(people$repository_id)),
x = rep(1, length(as.numeric(people$people))) )
row.names(A) <- levels(factor(people$people))
colnames(A) <- levels(factor(people$repository_id))
adj <- tcrossprod(A)
g <- graph.adjacency(adj, mode = "undirected", weighted = TRUE, diag = FALSE)
另见 here。
我有一个如下所示的 csv 文件:
"","people_id","commit_id"
"1",1,0
"2",1,117
"3",1,144
"4",1,278
…
Here's 如果您想查看 csv 文件。它包含 11735 行,但包含 5923 个唯一的人员 ID。
有谁知道如何将人员 ID 与常见的 "commit_id" 联系起来并忽略 commit_id 0,因为 ID 0 不存在。
目前我已经这样做了:
# read the csv file
commitsNetwork <- read.csv("commits.csv", header=TRUE)
# use a subset for demo purpose
commitsNetwork <- commitsNetwork[c("people_id", "commit_id")]
#build edgelist(for commits)
C <- spMatrix(nrow = length(unique(commitsNetwork$people_id)),
ncol = length(unique(commitsNetwork$commit_id)),
i = as.numeric(factor(commitsNetwork$people_id)),
j = as.numeric(factor(commitsNetwork$commit_id)),
x = rep(1, length(as.numeric(commitsNetwork$people_id))) )
row.names(C) <- levels(factor(commitsNetwork$people_id))
colnames(C) <- levels(factor(commitsNetwork$commit_id))
adjC <- tcrossprod(C)
comG <- graph.adjacency(adjC, mode = "undirected", weighted = TRUE, diag = FALSE)
#write to pajek file
write.graph(comG, "comNetwork.net", format = "pajek")
此外,边缘来自第 2 列 "commit_id"。如果两个顶点(人)都由第 6 列的公共 commit_id 连接。
因此我不确定如何在 R 中使用此 csv 文件生成网络。
理想的输出应该是这样的:
*顶点数 5923 1
2
3
4
...
*边
1 4 1
1 25 1
1 39 1
1 41 1
1 48 1
直到 5923...
也许你想要这样的东西:
library(igraph)
library(Matrix)
download.file("https://www.dropbox.com/s/q7sxfwjec97qzcy/people.csv?dl=1",
tf <- tempfile(fileext = ".csv"), mode = "wb")
people <- read.csv(tf)
A <- spMatrix(nrow = length(unique(people$people)),
ncol = length(unique(people$repository_id)),
i = as.numeric(factor(people$people)),
j = as.numeric(factor(people$repository_id)),
x = rep(1, length(as.numeric(people$people))) )
row.names(A) <- levels(factor(people$people))
colnames(A) <- levels(factor(people$repository_id))
adj <- tcrossprod(A)
g <- graph.adjacency(adj, mode = "undirected", weighted = TRUE, diag = FALSE)
另见 here。