绘制 NetworkX Girvan-Newman 算法发现的社区树状图

Question

用于网络社区检测的 Girvan-Newman 算法：

detects communities by progressively removing edges from the original graph. The algorithm removes the “most valuable” edge, traditionally the edge with the highest betweenness centrality, at each step. As the graph breaks down into pieces, the tightly knit community structure is exposed and the result can be depicted as a dendrogram.

在 NetworkX 中，实现 returns 集合元组上的迭代器。第一个元组是由 2 个社区组成的第一个切割，第二个元组是由 3 个社区组成的第二个切割，依此类推，直到最后一个元组具有 n 个单独节点（树状图的叶子）的 n 个集合。

import networkx as nx

G = nx.path_graph(10)
comp = nx.community.girvan_newman(G)
list(comp)

[({0, 1, 2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6, 7, 8, 9}), ({0, 1}, {2, 3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {8, 9, 7}), ({0, 1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3, 4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5, 6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8, 9}), ({0}, {1}, {2}, {3}, {4}, {5}, {6}, {7}, {8}, {9})]

问题是：如何绘制此树状图？

我看过 scipy.cluster.hierarchy.dendrogram but it expects a "linkage matrix" I'm guessing such as the one created by scipy.cluster.hierarchy.linkage，但我不确定如何将这个元组列表转换成这个 "linkage matrix"。

所以我想问一下如何绘制这个树状图，with/without SciPy 的 dendrogram 的帮助。

Answer 1

在@ItamarMushkin 之后，我按照@mdml 的回答稍加修改并得到了我想要的。在高层次上，我将 NetworkX 的 Girvan-Newman 迭代器输出转换为另一个 DiGraph() 我最终希望将其视为树状图。然后我构建 Z，一个我输入到 scipy.cluster.hierarchy.dendrogram 的链接矩阵，其形式为边缘列表，其中包括每个树状图合并的实际高度。

我必须对@mdml 的回答进行两处修改：

没那么重要：我对进入 index
更重要的是：我添加了一个 get_merge_height 函数，它根据 Girvan-Newman 输出的边移除顺序为每个合并提供其唯一的高度。否则，两个节点的所有合并在树状图中都是相同的高度，所有合并的下一级合并两个节点和另一个节点将是相同的高度，等等

我知道这里可能会有一些重复的迭代，我还没有考虑优化。

import networkx as nx
from itertools import chain, combinations
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram

# get simulated Graph() and Girvan-Newman communities list
G = nx.path_graph(10)
communities = list(nx.community.girvan_newman(G))

# building initial dict of node_id to each possible subset:
node_id = 0
init_node2community_dict = {node_id: communities[0][0].union(communities[0][1])}
for comm in communities:
    for subset in list(comm):
        if subset not in init_node2community_dict.values():
            node_id += 1
            init_node2community_dict[node_id] = subset

# turning this dictionary to the desired format in @mdml's answer
node_id_to_children = {e: [] for e in init_node2community_dict.keys()}
for node_id1, node_id2 in combinations(init_node2community_dict.keys(), 2):
    for node_id_parent, group in init_node2community_dict.items():
        if len(init_node2community_dict[node_id1].intersection(init_node2community_dict[node_id2])) == 0 and group == init_node2community_dict[node_id1].union(init_node2community_dict[node_id2]):
            node_id_to_children[node_id_parent].append(node_id1)
            node_id_to_children[node_id_parent].append(node_id2)

# also recording node_labels dict for the correct label for dendrogram leaves
node_labels = dict()
for node_id, group in init_node2community_dict.items():
    if len(group) == 1:
        node_labels[node_id] = list(group)[0]
    else:
        node_labels[node_id] = ''

# also needing a subset to rank dict to later know within all k-length merges which came first
subset_rank_dict = dict()
rank = 0
for e in communities[::-1]:
    for p in list(e):
        if tuple(p) not in subset_rank_dict:
            subset_rank_dict[tuple(sorted(p))] = rank
            rank += 1
subset_rank_dict[tuple(sorted(chain.from_iterable(communities[-1])))] = rank

# my function to get a merge height so that it is unique (probably not that efficient)
def get_merge_height(sub):
    sub_tuple = tuple(sorted([node_labels[i] for i in sub]))
    n = len(sub_tuple)
    other_same_len_merges = {k: v for k, v in subset_rank_dict.items() if len(k) == n}
    min_rank, max_rank = min(other_same_len_merges.values()), max(other_same_len_merges.values())
    range = (max_rank-min_rank) if max_rank > min_rank else 1
    return float(len(sub)) + 0.8 * (subset_rank_dict[sub_tuple] - min_rank) / range

# finally using @mdml's magic, slightly modified:
G           = nx.DiGraph(node_id_to_children)
nodes       = G.nodes()
leaves      = set( n for n in nodes if G.out_degree(n) == 0 )
inner_nodes = [ n for n in nodes if G.out_degree(n) > 0 ]

# Compute the size of each subtree
subtree = dict( (n, [n]) for n in leaves )
for u in inner_nodes:
    children = set()
    node_list = list(node_id_to_children[u])
    while len(node_list) > 0:
        v = node_list.pop(0)
        children.add( v )
        node_list += node_id_to_children[v]
    subtree[u] = sorted(children & leaves)

inner_nodes.sort(key=lambda n: len(subtree[n])) # <-- order inner nodes ascending by subtree size, root is last

# Construct the linkage matrix
leaves = sorted(leaves)
index  = dict( (tuple([n]), i) for i, n in enumerate(leaves) )
Z = []
k = len(leaves)
for i, n in enumerate(inner_nodes):
    children = node_id_to_children[n]
    x = children[0]
    for y in children[1:]:
        z = tuple(sorted(subtree[x] + subtree[y]))
        i, j = index[tuple(sorted(subtree[x]))], index[tuple(sorted(subtree[y]))]
        Z.append([i, j, get_merge_height(subtree[n]), len(z)]) # <-- float is required by the dendrogram function
        index[z] = k
        subtree[z] = list(z)
        x = z
        k += 1

# dendrogram
plt.figure()
dendrogram(Z, labels=[node_labels[node_id] for node_id in leaves])
plt.savefig('dendrogram.png')

绘制 NetworkX Girvan-Newman 算法发现的社区树状图

Plot the dendrogram of communities found by NetworkX Girvan-Newman algorithm

python

matplotlib

dendrogram

scipy

networkx