Python:两个网络之间的jaccard相似度?
Python: jaccard similartity between two networks?
我有 2
large 网络 G
和 G1
使用 networkx
包生成。我想计算所有节点之间的 jaccard similarity 索引。
一种可能的方法如下:
def returnJaccardNetworks(G, G1):
tmp = list(G.nodes())
tmp1 = list(G1.nodes())
tmp2 = np.unique([tmp, tmp1]) ### Find nodes in the networks
jc = []
for i in tmp2:
## if the node i is in G and in G1 compute
## the similarity between the lists of the ajacent nodes
## otherwise append 0
if (i in G) and (i in G1):
k1 = list(G[i]) ## adjacent nodes of i in the network G
k2 = list(G1[i]) ## adjacent nodes of i in the network G1
### Start Jaccard Similarity
intersect = list(set(k1) & set(k2))
n = len(intersect)
jc.append(n / float(len(k1) + len(k2) - n))
### End Jaccard Similariy
else:
jc.append(0)
return jc
我想知道是否有更有效的方法。我注意到包中有一个名为 jaccard_coefficient
的函数,但我不确定它是如何工作的。
您的实施非常高效(尽管不完美,IMO)。使用此版本我可以在我的机器上减少 15% 的执行时间:
def get_jaccard_coefficients(G, H):
for v in G:
if v in H:
n = set(G[v]) # neighbors of v in G
m = set(H[v]) # neighbors of v in H
length_intersection = len(n & m)
length_union = len(n) + len(m) - length_intersection
yield v, float(length_intersection) / length_union
else:
yield v, 0. # should really yield v, None as measure is not defined for these nodes
另一个版本更紧凑,更易于维护,但执行时间增加了 30%:
def get_jaccard_coefficients(G, H):
for v in set(G.nodes) & set(H.nodes): # i.e. the intersection
n = set(G[v]) # neighbors of v in G
m = set(H[v]) # neighbors of v in H
yield v, len(n & m) / float(len(n | m))
我有 2
large 网络 G
和 G1
使用 networkx
包生成。我想计算所有节点之间的 jaccard similarity 索引。
一种可能的方法如下:
def returnJaccardNetworks(G, G1):
tmp = list(G.nodes())
tmp1 = list(G1.nodes())
tmp2 = np.unique([tmp, tmp1]) ### Find nodes in the networks
jc = []
for i in tmp2:
## if the node i is in G and in G1 compute
## the similarity between the lists of the ajacent nodes
## otherwise append 0
if (i in G) and (i in G1):
k1 = list(G[i]) ## adjacent nodes of i in the network G
k2 = list(G1[i]) ## adjacent nodes of i in the network G1
### Start Jaccard Similarity
intersect = list(set(k1) & set(k2))
n = len(intersect)
jc.append(n / float(len(k1) + len(k2) - n))
### End Jaccard Similariy
else:
jc.append(0)
return jc
我想知道是否有更有效的方法。我注意到包中有一个名为 jaccard_coefficient
的函数,但我不确定它是如何工作的。
您的实施非常高效(尽管不完美,IMO)。使用此版本我可以在我的机器上减少 15% 的执行时间:
def get_jaccard_coefficients(G, H):
for v in G:
if v in H:
n = set(G[v]) # neighbors of v in G
m = set(H[v]) # neighbors of v in H
length_intersection = len(n & m)
length_union = len(n) + len(m) - length_intersection
yield v, float(length_intersection) / length_union
else:
yield v, 0. # should really yield v, None as measure is not defined for these nodes
另一个版本更紧凑,更易于维护,但执行时间增加了 30%:
def get_jaccard_coefficients(G, H):
for v in set(G.nodes) & set(H.nodes): # i.e. the intersection
n = set(G[v]) # neighbors of v in G
m = set(H[v]) # neighbors of v in H
yield v, len(n & m) / float(len(n | m))