计算双峰网络中的逆对数加权相似度,Python 中的 igraph

Calculate inverse log-weighted similarity in a bimodal network, igraph in Python

我正在尝试计算具有两种类型节点的网络的 Adamic-Adar 相似度。我只对计算具有传出连接的节点之间的相似性感兴趣。具有传入连接的节点是一种连接器,我对它们不感兴趣。

数据大小和特征:

> summary(g)
IGRAPH DNW- 3852 24478 -- 
+ attr: name (v/c), weight (e/n)

Python2.7:

中的原型代码
import glob
import os
import pandas as pd
from igraph import *

os.chdir("data/")
for file in glob.glob("*.graphml"):
    print(file)
    g = Graph.Read_GraphML(file)
    indegree = Graph.degree(g, mode="in")

    g['indegree'] = indegree
    dev = g.vs.select(indegree == 0)

    m = Graph.similarity_inverse_log_weighted(dev.subgraph())

    df = pd.melt(m)

    df.to_csv(file.split("_only.graphml")[0] + "_similarity.csv", sep=',')

这段代码有问题,因为 dev 的长度是 1,而 m 的长度是 0.0,所以它没有按预期工作。

提示

我在 R 中有一个 工作 代码,但似乎我无法将它重写为 Python (我这样做是为了性能, 网络很大)。在这里:

   # make sure g is your network
   indegree <- degree(g, mode="in")
   V(g)$indegree <- indegree
   dev <- V(g)[indegree==0]
   m <- similarity.invlogweighted(g, dev)
   x.m <- melt(m)
   colnames(x.m) <- c("dev1", "dev2", "value")
   x.m <- x.m[x.m$value > 0, ]

   write.csv(x.m, file = sub(".csv",
                             "_similarity.csv", filename))

您将入度指定为 graph 属性,而不是 vertex 属性,因此您不能合理地调用 g.vs.select() 稍后。你需要这个:

indegree = g.degree(mode="in")
g.vs["indegree"] = indegree
dev = g.vs.select(indegree=0)

但实际上,您可以简单地这样写:

dev = g.vs.select(_indegree=0)

之所以有效,是因为 select 方法的工作原理:

Attribute names inferred from keyword arguments are treated specially if they start with an underscore (_). These are not real attributes but refer to specific properties of the vertices, e.g., its degree. The rule is as follows: if an attribute name starts with an underscore, the rest of the name is interpreted as a method of the Graph object. This method is called with the vertex sequence as its first argument (all others left at default values) and vertices are filtered according to the value returned by the method.