如何在 R 中具有断开组件的网络上计算紧密度中心性度量?
How to compute closeness centrality measure on a network with disconnected components in R?
我想在具有断开组件的网络上计算紧密度中心性度量。 closeness
igraph
中的函数在此类图表上没有给出有意义的结果。 (see)
然后我偶然发现了 this site,其中解释了可以在具有断开连接的组件的图形上测量接近度。
以下代码是实现此目的的建议:
# Load tnet
library(tnet)
# Load network
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
# Calculate measures
closeness_w(net, gconly=FALSE)
在我的例子中,我有一个交易数据,所以我在这个数据上建立的网络是 directed
和 weighted
。权重包括 1/(transaction amount)
.
这是我的示例数据:
structure(list(id = c(2557L, 1602L, 18669L, 35900L, 48667L, 51341L
), from = c("5370", "6390", "5370", "5370", "8934", "5370"),
to = c("5636", "5370", "8933", "8483", "5370", "7626"), date = structure(c(13099,
13113, 13117, 13179, 13238, 13249), class = "Date"), amount = c(2921,
8000, 169.2, 71.5, 14.6, 4214)), row.names = c(NA, -6L), class = "data.frame")
我使用下面的代码来实现我想要的:
df2 <- select(df,c(from,to,amount)) %>%
group_by(from,to) %>% mutate(weights=1/sum(amount)) %>% select(-amount) %>% distinct
network <- cbind(df2$from,df2$to,df2$weights)
cl <- closeness_w(network, directed = T, gconly=FALSE) # here it gives the error: "Error in net[, "w"]^alpha : non-numeric argument to binary operator"
# so I modify from and to columns as follows to solve the error mentioned above
df2$from <- as.integer(df2$from)
df2$to <- as.integer(df2$to)
# then I run the code again
network <- cbind(df2$from,df2$to,df2$weights)
cl <- closeness_w(network, directed = T, gconly=FALSE)
但是输出不像网站上的那样只包含每个节点的接近度分数,而是创建了这么多值为 0 的行,我不知道为什么。
我得到的输出结果如下:
node closeness n.closeness
[1,] 1 0.00000000 0.000000000000
[2,] 2 0.00000000 0.000000000000
[3,] 3 0.00000000 0.000000000000
[4,] 4 0.00000000 0.000000000000
[5,] 5 0.00000000 0.000000000000
...........................................................
[330,] 330 0.00000000 0.000000000000
[331,] 331 0.00000000 0.000000000000
[332,] 332 0.00000000 0.000000000000
[333,] 333 0.00000000 0.000000000000
[ reached getOption("max.print") -- omitted 8600 rows ]
此外,网站上给出的数据中 i
和 j
列中的输入是互反的,即 1->2 存在当且仅当 2->1 存在。但是我的数据不是这样的,所以在我的数据中5370
汇款到5636
,但是5636
还没有汇款到5370
。那么,我如何才能在这种有向交易数据网络上正确计算紧密度度量。有没有人以前尝试过类似的计算?
EDIT:
Since the weights are not considered as distance in closeness_w
function, but rather they are considered as strength, I should have determined weights
as sum(amount)
instead of 1/sum(amount)
您获得许多零值行的原因是因为它为节点 1 到 8934(矩阵中的最大值)提供了接近度值。如果您过滤数据框中的值,您将找到您要查找的值:
cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl[cl[, "node"] %in% c(df2$from), ]
node closeness n.closeness
[1,] 5370 1.37893704 1.543644e-04
[2,] 6390 0.03668555 4.106745e-06
[3,] 8934 5.80008056 6.492870e-04
方向已考虑在内,如果您筛选 'to' 个节点,您将只看到 5370 个具有值:
cl[cl[, "node"] %in% c(df2$to), ]
node closeness n.closeness
[1,] 5370 1.378937 0.0001543644
[2,] 5636 0.000000 0.0000000000
[3,] 7626 0.000000 0.0000000000
[4,] 8483 0.000000 0.0000000000
[5,] 8933 0.000000 0.0000000000
如果你回到你正在关注的例子,如果你从数据中间删除节点,你会看到它为缺失的节点提供零,然后尝试设置 directed = F
并且你'你会注意到区别的。
更新:
如果您想要创建网络的替代方法,在创建 df2 之后,您只需将其传递到 closeness_w 函数中,您的节点标签将成为索引,节点列将减少到 1:n:
df2 <- df %>%
group_by(from, to) %>%
mutate(weights = 1/sum(amount)) %>%
select(from, to, weights) %>%
distinct
cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl
node closeness n.closeness
5370 1 1.37893704 0.229822840
5636 2 0.00000000 0.000000000
7626 3 0.00000000 0.000000000
8483 4 0.00000000 0.000000000
8933 5 0.00000000 0.000000000
6390 6 0.03668555 0.006114259
8934 7 5.80008056 0.966680093
你引用的网页没有说明“closeness can be applyed to disconnected networks”。相反,它建议计算与接近程度完全不同的数量。
他们计算的其实就是所谓的全局效率,在这篇论文中提出:
您会在某些包中找到实现。我也为 igraph 实现了这一点,它将包含在 C/igraph 的 0.9 版本中(大概也包含在 R/igraph 的某些版本中)。它已经可以访问 from IGraph/M,用作 igraph 的 Mathematica 接口。
我想在具有断开组件的网络上计算紧密度中心性度量。 closeness
igraph
中的函数在此类图表上没有给出有意义的结果。 (see)
然后我偶然发现了 this site,其中解释了可以在具有断开连接的组件的图形上测量接近度。
以下代码是实现此目的的建议:
# Load tnet
library(tnet)
# Load network
# Node K is assigned node id 8 instead of 10 as isolates at the end of id sequences are not recorded in edgelists
net <- cbind(
i=c(1,1,2,2,2,3,3,3,4,4,4,5,5,6,6,7,9,10,10,11),
j=c(2,3,1,3,5,1,2,4,3,6,7,2,6,4,5,4,10,9,11,10),
w=c(1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1))
# Calculate measures
closeness_w(net, gconly=FALSE)
在我的例子中,我有一个交易数据,所以我在这个数据上建立的网络是 directed
和 weighted
。权重包括 1/(transaction amount)
.
这是我的示例数据:
structure(list(id = c(2557L, 1602L, 18669L, 35900L, 48667L, 51341L
), from = c("5370", "6390", "5370", "5370", "8934", "5370"),
to = c("5636", "5370", "8933", "8483", "5370", "7626"), date = structure(c(13099,
13113, 13117, 13179, 13238, 13249), class = "Date"), amount = c(2921,
8000, 169.2, 71.5, 14.6, 4214)), row.names = c(NA, -6L), class = "data.frame")
我使用下面的代码来实现我想要的:
df2 <- select(df,c(from,to,amount)) %>%
group_by(from,to) %>% mutate(weights=1/sum(amount)) %>% select(-amount) %>% distinct
network <- cbind(df2$from,df2$to,df2$weights)
cl <- closeness_w(network, directed = T, gconly=FALSE) # here it gives the error: "Error in net[, "w"]^alpha : non-numeric argument to binary operator"
# so I modify from and to columns as follows to solve the error mentioned above
df2$from <- as.integer(df2$from)
df2$to <- as.integer(df2$to)
# then I run the code again
network <- cbind(df2$from,df2$to,df2$weights)
cl <- closeness_w(network, directed = T, gconly=FALSE)
但是输出不像网站上的那样只包含每个节点的接近度分数,而是创建了这么多值为 0 的行,我不知道为什么。
我得到的输出结果如下:
node closeness n.closeness
[1,] 1 0.00000000 0.000000000000
[2,] 2 0.00000000 0.000000000000
[3,] 3 0.00000000 0.000000000000
[4,] 4 0.00000000 0.000000000000
[5,] 5 0.00000000 0.000000000000
...........................................................
[330,] 330 0.00000000 0.000000000000
[331,] 331 0.00000000 0.000000000000
[332,] 332 0.00000000 0.000000000000
[333,] 333 0.00000000 0.000000000000
[ reached getOption("max.print") -- omitted 8600 rows ]
此外,网站上给出的数据中 i
和 j
列中的输入是互反的,即 1->2 存在当且仅当 2->1 存在。但是我的数据不是这样的,所以在我的数据中5370
汇款到5636
,但是5636
还没有汇款到5370
。那么,我如何才能在这种有向交易数据网络上正确计算紧密度度量。有没有人以前尝试过类似的计算?
EDIT: Since the weights are not considered as distance in
closeness_w
function, but rather they are considered as strength, I should have determinedweights
assum(amount)
instead of1/sum(amount)
您获得许多零值行的原因是因为它为节点 1 到 8934(矩阵中的最大值)提供了接近度值。如果您过滤数据框中的值,您将找到您要查找的值:
cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl[cl[, "node"] %in% c(df2$from), ]
node closeness n.closeness
[1,] 5370 1.37893704 1.543644e-04
[2,] 6390 0.03668555 4.106745e-06
[3,] 8934 5.80008056 6.492870e-04
方向已考虑在内,如果您筛选 'to' 个节点,您将只看到 5370 个具有值:
cl[cl[, "node"] %in% c(df2$to), ]
node closeness n.closeness
[1,] 5370 1.378937 0.0001543644
[2,] 5636 0.000000 0.0000000000
[3,] 7626 0.000000 0.0000000000
[4,] 8483 0.000000 0.0000000000
[5,] 8933 0.000000 0.0000000000
如果你回到你正在关注的例子,如果你从数据中间删除节点,你会看到它为缺失的节点提供零,然后尝试设置 directed = F
并且你'你会注意到区别的。
更新:
如果您想要创建网络的替代方法,在创建 df2 之后,您只需将其传递到 closeness_w 函数中,您的节点标签将成为索引,节点列将减少到 1:n:
df2 <- df %>%
group_by(from, to) %>%
mutate(weights = 1/sum(amount)) %>%
select(from, to, weights) %>%
distinct
cl <- closeness_w(df2, directed = T, gconly=FALSE)
cl
node closeness n.closeness
5370 1 1.37893704 0.229822840
5636 2 0.00000000 0.000000000
7626 3 0.00000000 0.000000000
8483 4 0.00000000 0.000000000
8933 5 0.00000000 0.000000000
6390 6 0.03668555 0.006114259
8934 7 5.80008056 0.966680093
你引用的网页没有说明“closeness can be applyed to disconnected networks”。相反,它建议计算与接近程度完全不同的数量。
他们计算的其实就是所谓的全局效率,在这篇论文中提出:
您会在某些包中找到实现。我也为 igraph 实现了这一点,它将包含在 C/igraph 的 0.9 版本中(大概也包含在 R/igraph 的某些版本中)。它已经可以访问 from IGraph/M,用作 igraph 的 Mathematica 接口。