在有向网络中计算二阶邻接矩阵
Computing second-degree adjacency matrix in directed network
我有一个定向网络,我正在尝试构建一个二阶邻接矩阵。
假设网络由相互注视的人组成。从邻接矩阵我知道谁在看谁。
对于第二学位,我的意思是:对于每个人,他是否至少被我看过的人中的一个看过?
然后我想把这个二阶邻接矩阵附加到初始的。
以下代码是我一直在尝试做的事情的可重现示例,它有效,但考虑到我的矩阵的大小,计算可能需要几天时间:
t <- new("dgCMatrix"
, i = c(3L, 4L, 0L, 1L, 2L, 4L, 2L, 3L, 4L, 1L, 2L, 1L)
, p = c(0L, 2L, 6L, 9L, 11L, 12L)
, Dim = c(5L, 5L)
, Dimnames = list(NULL, NULL)
, x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
, factors = list()
)
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
这是输出,它是所需的输出:
5 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘’, ‘’, ‘’ ... ]]
[1,] . 1 . . . . 1 . 1 1
[2,] . 1 . 1 1 1 1 1 1 1
[3,] . 1 1 1 . 1 1 1 1 1
[4,] 1 . 1 . . . 1 1 1 .
[5,] 1 1 1 . . . 1 1 1 1
它不是计算密集型的,只是非常长。我用了 运行 3 个小时,它还没有完成 1% 的过程。
有谁知道更好、更快的方法吗?
感谢您的帮助
下面的可能会快很多,但是结果没有相同的 dimnames
属性。
首先是题目中的代码。原矩阵t
存起来备用
t_save <- t # save this for later
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
result1 <- t
现在另一个代码给出了相同的结果。原始 t
从 t_saved
中检索。并且不需要创建向量 a
.
t <- t_save
b <- integer(length = 5)
t2 <- matrix(NA, nrow = nrow(t), ncol = ncol(t))
for (y in 1:5){ #example with person 1
for (i in 1:5){
b[i] <- any(t[i, ] & t[, y])
}
t2[, y] <- as.integer(b)
}
result2 <- cbind(t, t2)
比较两个结果,发现唯一的区别是暗淡的名称。
all.equal(result1, result2)
#[1] "Attributes: < Component “Dimnames”: Component 2: Modes: character, NULL >"
#[2] "Attributes: < Component “Dimnames”: Component 2: Lengths: 10, 0 >"
#[3] "Attributes: < Component “Dimnames”: Component 2: target is character, current is NULL >"
所以,不要检查属性。
all.equal(result1, result2, check.attributes = FALSE)
#[1] TRUE
编辑。
另一种选择是使用 R 的矩阵乘法。
t <- t_save
t2 <- t %*% t
t2[t2 > 0] <- 1L
result3 <- cbind(t, t2)
all.equal(result2, result3)
#[1] TRUE
基准。
以上3种方法都可以写成只有一个参数的函数,即稀疏矩阵。在矩阵被命名为 t
的问题中,在函数的定义中它将是 A
.
f1 <- function(A){
n <- nrow(A)
a <- numeric(length = n) #create vector for the loop
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
for (j in seq_len(n)){
if (A[i,j] == 1 & A[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
A <- cbind(A, b)
}
A
}
f2 <- function(A){
n <- nrow(A)
t2 <- matrix(NA, nrow = nrow(A), ncol = ncol(A))
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
b[i] <- +any(A[i, ] & A[, y])
}
t2[, y] <- b
}
cbind(A, t2)
}
f3 <- function(A){
t2 <- A %*% A
t2[t2 > 0] <- 1L
cbind(A, t2)
}
现在是测试。为了给它们计时,我将使用包 microbenchmark
.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1(t),
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 2.35833 2.646116 3.354992 2.702440 3.452346 6.795902 10 a
# f2 8.02674 8.062097 8.332795 8.280234 8.398213 9.087690 10 b
# f1 52.08579 52.120208 55.150915 53.949815 57.413373 61.919080 10 c
矩阵乘法函数f3
显然是最快的。
第二个测试将是 运行,矩阵更大。
t_save <- t
for(i in 1:5){
t <- cbind(t, t)
t <- rbind(t, t)
}
dim(t)
#[1] 160 160
并且只会测试 f2
和 f3
。
mb_big <- microbenchmark(
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb_big, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 15.8503 15.94404 16.23394 16.07454 16.19684 17.88267 10 a
# f2 10682.5161 10718.67824 10825.92810 10777.95263 10912.53420 11051.10192 10 b
现在差别很大。
我有一个定向网络,我正在尝试构建一个二阶邻接矩阵。 假设网络由相互注视的人组成。从邻接矩阵我知道谁在看谁。 对于第二学位,我的意思是:对于每个人,他是否至少被我看过的人中的一个看过? 然后我想把这个二阶邻接矩阵附加到初始的。
以下代码是我一直在尝试做的事情的可重现示例,它有效,但考虑到我的矩阵的大小,计算可能需要几天时间:
t <- new("dgCMatrix"
, i = c(3L, 4L, 0L, 1L, 2L, 4L, 2L, 3L, 4L, 1L, 2L, 1L)
, p = c(0L, 2L, 6L, 9L, 11L, 12L)
, Dim = c(5L, 5L)
, Dimnames = list(NULL, NULL)
, x = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
, factors = list()
)
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
这是输出,它是所需的输出:
5 x 10 sparse Matrix of class "dgCMatrix"
[[ suppressing 10 column names ‘’, ‘’, ‘’ ... ]]
[1,] . 1 . . . . 1 . 1 1
[2,] . 1 . 1 1 1 1 1 1 1
[3,] . 1 1 1 . 1 1 1 1 1
[4,] 1 . 1 . . . 1 1 1 .
[5,] 1 1 1 . . . 1 1 1 1
它不是计算密集型的,只是非常长。我用了 运行 3 个小时,它还没有完成 1% 的过程。
有谁知道更好、更快的方法吗?
感谢您的帮助
下面的可能会快很多,但是结果没有相同的 dimnames
属性。
首先是题目中的代码。原矩阵t
存起来备用
t_save <- t # save this for later
a <- numeric(length = 5) #create vector for the loop
b <- numeric(length = 5) #create vector to be filled and then binded
for (y in 1:5){ #example with person 1
for (i in 1:5){
for (j in 1:5){
if (t[i,j] == 1 & t[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
t <- cbind(t, b)
}
result1 <- t
现在另一个代码给出了相同的结果。原始 t
从 t_saved
中检索。并且不需要创建向量 a
.
t <- t_save
b <- integer(length = 5)
t2 <- matrix(NA, nrow = nrow(t), ncol = ncol(t))
for (y in 1:5){ #example with person 1
for (i in 1:5){
b[i] <- any(t[i, ] & t[, y])
}
t2[, y] <- as.integer(b)
}
result2 <- cbind(t, t2)
比较两个结果,发现唯一的区别是暗淡的名称。
all.equal(result1, result2)
#[1] "Attributes: < Component “Dimnames”: Component 2: Modes: character, NULL >"
#[2] "Attributes: < Component “Dimnames”: Component 2: Lengths: 10, 0 >"
#[3] "Attributes: < Component “Dimnames”: Component 2: target is character, current is NULL >"
所以,不要检查属性。
all.equal(result1, result2, check.attributes = FALSE)
#[1] TRUE
编辑。
另一种选择是使用 R 的矩阵乘法。
t <- t_save
t2 <- t %*% t
t2[t2 > 0] <- 1L
result3 <- cbind(t, t2)
all.equal(result2, result3)
#[1] TRUE
基准。
以上3种方法都可以写成只有一个参数的函数,即稀疏矩阵。在矩阵被命名为 t
的问题中,在函数的定义中它将是 A
.
f1 <- function(A){
n <- nrow(A)
a <- numeric(length = n) #create vector for the loop
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
for (j in seq_len(n)){
if (A[i,j] == 1 & A[j,y] == 1){a[j] <- 1}
else {a[j] <- 0}
} #if the ones that i looks at, do look at person 1
if (sum(a) >= 1){b[i] <- 1} else {b[i] <- 0} # if at least one of the people i looks at, looks at 1, then b[i] = 1
}
A <- cbind(A, b)
}
A
}
f2 <- function(A){
n <- nrow(A)
t2 <- matrix(NA, nrow = nrow(A), ncol = ncol(A))
b <- numeric(length = n) #create vector to be filled and then binded
for (y in seq_len(n)){ #example with person 1
for (i in seq_len(n)){
b[i] <- +any(A[i, ] & A[, y])
}
t2[, y] <- b
}
cbind(A, t2)
}
f3 <- function(A){
t2 <- A %*% A
t2[t2 > 0] <- 1L
cbind(A, t2)
}
现在是测试。为了给它们计时,我将使用包 microbenchmark
.
library(microbenchmark)
mb <- microbenchmark(
f1 = f1(t),
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 2.35833 2.646116 3.354992 2.702440 3.452346 6.795902 10 a
# f2 8.02674 8.062097 8.332795 8.280234 8.398213 9.087690 10 b
# f1 52.08579 52.120208 55.150915 53.949815 57.413373 61.919080 10 c
矩阵乘法函数f3
显然是最快的。
第二个测试将是 运行,矩阵更大。
t_save <- t
for(i in 1:5){
t <- cbind(t, t)
t <- rbind(t, t)
}
dim(t)
#[1] 160 160
并且只会测试 f2
和 f3
。
mb_big <- microbenchmark(
f2 = f2(t),
f3 = f3(t),
times = 10
)
print(mb_big, order = "median")
#Unit: milliseconds
# expr min lq mean median uq max neval cld
# f3 15.8503 15.94404 16.23394 16.07454 16.19684 17.88267 10 a
# f2 10682.5161 10718.67824 10825.92810 10777.95263 10912.53420 11051.10192 10 b
现在差别很大。