R和手动计算中接近函数的差异
Difference in closeness function in R and manual computation
我有一个无向加权图,我想在其中计算接近度度量。根据 igraph
文档,它是平均最短路径的倒数。我计算了最短路径并将它们的平均值取反,但仍然没有得到与 closeness
函数中相同的值。为什么会这样?我错过了什么?
这是我的代码:
dput(c$estimate)
structure(c(1, 10000, 10000, 2.69857209553848, 5.77115055524614,
1.95672007809809, 2.98690863617922, 1.92161847347611, 10000,
10000, 10000, 10000, 1, 1.97201563662035, 5.4078452590091, 10000,
6.8534542161595, 3.51453278996925, 10000, 10000, 2.08964950396744,
10000, 10000, 1.97201563662034, 1, 2.78868220464485, 10000, 3.41857460835551,
10000, 1.96044036389546, 10000, 10000, 10000, 2.69857209553835,
5.40784525900909, 2.78868220464486, 1, 10000, 10000, 3.54317409176484,
10000, 2.33889236077342, 10000, 10000, 5.77115055524604, 10000,
10000, 10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 1.95672007809807,
6.85345421615961, 3.41857460835555, 10000, 10000, 1, 10000, 10000,
2.49075030691086, 10000, 10000, 2.98690863617922, 3.51453278996926,
10000, 3.54317409176474, 10000, 10000, 1, 10000, 10000, 10000,
1.73687483250751, 1.92161847347613, 10000, 1.96044036389548,
10000, 10000, 10000, 10000, 1, 4.24032760636799, 3.11756167665886,
5.07827243244947, 10000, 10000, 10000, 2.33889236077345, 10000,
2.49075030691088, 10000, 4.24032760636804, 1, 10000, 1.69643890905686,
10000, 2.08964950396742, 10000, 10000, 10000, 10000, 10000, 3.11756167665892,
10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1.73687483250752,
5.0782724324492, 1.69643890905687, 10000, 1), .Dim = c(11L, 11L
), .Dimnames = list(c("jpm", "gs", "ms", "bofa", "schwab", "brk",
"wf", "citi", "amex", "spgl", "pnc"), c("jpm", "gs", "ms", "bofa",
"schwab", "brk", "wf", "citi", "amex", "spgl", "pnc")))
g <- graph_from_adjacency_matrix(c$estimate, weighted="wt", mode="undirected", diag=F)
closeness(g,weights= round(E(g)$wt,2))
jpm gs ms bofa schwab brk wf citi
0.02503756 0.01877229 0.02203614 0.02151463 0.01088495 0.02189621 0.02226180 0.02418380
amex spgl pnc
0.01988072 0.01632387 0.01913509
# manual
a <- shortest.paths(g,weights=round(E(g)$wt,2))
1/rowMeans(a)
jpm gs ms bofa schwab brk wf citi amex
0.2799695 0.2143414 0.2435245 0.2457002 0.1205876 0.2408583 0.2448798 0.2660218 0.2276490
spgl pnc
0.1855914 0.2140078
您可能需要注意两个地方:
- 您应该在
closeness
中启用 normalized = TRUE
- 当您尝试使用最短路径长度来定义接近中心性时,您应该知道该距离是对不包括自身的距离的平均。因此,
vcount(g)-1
是平均分母,而不是 vcount(g)
,这就是为什么不应该使用 rowMeans
.
从下面的代码可以看出,两种方法的结果很接近(精度可能有细微差别,但我不确定)
> closeness(g,weights = E(g)$wt,normalized = TRUE)
jpm gs ms bofa schwab brk wf citi
0.2504451 0.1876864 0.2203154 0.2151935 0.1088503 0.2190827 0.2226391 0.2418350
amex spgl pnc
0.1988941 0.1632546 0.1914826
> (vcount(g) - 1) / rowSums(shortest.paths(g, weights = E(g)$wt))
jpm gs ms bofa schwab brk wf citi
0.2545725 0.1947856 0.2213624 0.2234093 0.1096228 0.2190827 0.2226391 0.2418350
amex spgl pnc
0.2070431 0.1687258 0.1946688
我有一个无向加权图,我想在其中计算接近度度量。根据 igraph
文档,它是平均最短路径的倒数。我计算了最短路径并将它们的平均值取反,但仍然没有得到与 closeness
函数中相同的值。为什么会这样?我错过了什么?
这是我的代码:
dput(c$estimate)
structure(c(1, 10000, 10000, 2.69857209553848, 5.77115055524614,
1.95672007809809, 2.98690863617922, 1.92161847347611, 10000,
10000, 10000, 10000, 1, 1.97201563662035, 5.4078452590091, 10000,
6.8534542161595, 3.51453278996925, 10000, 10000, 2.08964950396744,
10000, 10000, 1.97201563662034, 1, 2.78868220464485, 10000, 3.41857460835551,
10000, 1.96044036389546, 10000, 10000, 10000, 2.69857209553835,
5.40784525900909, 2.78868220464486, 1, 10000, 10000, 3.54317409176484,
10000, 2.33889236077342, 10000, 10000, 5.77115055524604, 10000,
10000, 10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 1.95672007809807,
6.85345421615961, 3.41857460835555, 10000, 10000, 1, 10000, 10000,
2.49075030691086, 10000, 10000, 2.98690863617922, 3.51453278996926,
10000, 3.54317409176474, 10000, 10000, 1, 10000, 10000, 10000,
1.73687483250751, 1.92161847347613, 10000, 1.96044036389548,
10000, 10000, 10000, 10000, 1, 4.24032760636799, 3.11756167665886,
5.07827243244947, 10000, 10000, 10000, 2.33889236077345, 10000,
2.49075030691088, 10000, 4.24032760636804, 1, 10000, 1.69643890905686,
10000, 2.08964950396742, 10000, 10000, 10000, 10000, 10000, 3.11756167665892,
10000, 1, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 1.73687483250752,
5.0782724324492, 1.69643890905687, 10000, 1), .Dim = c(11L, 11L
), .Dimnames = list(c("jpm", "gs", "ms", "bofa", "schwab", "brk",
"wf", "citi", "amex", "spgl", "pnc"), c("jpm", "gs", "ms", "bofa",
"schwab", "brk", "wf", "citi", "amex", "spgl", "pnc")))
g <- graph_from_adjacency_matrix(c$estimate, weighted="wt", mode="undirected", diag=F)
closeness(g,weights= round(E(g)$wt,2))
jpm gs ms bofa schwab brk wf citi
0.02503756 0.01877229 0.02203614 0.02151463 0.01088495 0.02189621 0.02226180 0.02418380
amex spgl pnc
0.01988072 0.01632387 0.01913509
# manual
a <- shortest.paths(g,weights=round(E(g)$wt,2))
1/rowMeans(a)
jpm gs ms bofa schwab brk wf citi amex
0.2799695 0.2143414 0.2435245 0.2457002 0.1205876 0.2408583 0.2448798 0.2660218 0.2276490
spgl pnc
0.1855914 0.2140078
您可能需要注意两个地方:
- 您应该在
closeness
中启用 - 当您尝试使用最短路径长度来定义接近中心性时,您应该知道该距离是对不包括自身的距离的平均。因此,
vcount(g)-1
是平均分母,而不是vcount(g)
,这就是为什么不应该使用rowMeans
.
normalized = TRUE
从下面的代码可以看出,两种方法的结果很接近(精度可能有细微差别,但我不确定)
> closeness(g,weights = E(g)$wt,normalized = TRUE)
jpm gs ms bofa schwab brk wf citi
0.2504451 0.1876864 0.2203154 0.2151935 0.1088503 0.2190827 0.2226391 0.2418350
amex spgl pnc
0.1988941 0.1632546 0.1914826
> (vcount(g) - 1) / rowSums(shortest.paths(g, weights = E(g)$wt))
jpm gs ms bofa schwab brk wf citi
0.2545725 0.1947856 0.2213624 0.2234093 0.1096228 0.2190827 0.2226391 0.2418350
amex spgl pnc
0.2070431 0.1687258 0.1946688