网络集群的唯一 ID

Question

我有一个节点网络（parends/childs，每个节点都有一个 ID），我想为每个连接的节点集群生成一个唯一的 ID。我正在使用 Python、Pandas 和 networkx.

比如我有

id  a    b    c
1  101  201  301
2  101  202  302
3  102  202  302
4  103  203  303
5  103  204  304

例如，在第 a 列中，第 1 行和第 2 行是链接的。

我想获得

id  a    b    c   id_cluster
1  101  201  301      1
2  101  202  302      1
3  102  202  302      1
4  103  203  303      2
5  103  204  304      2

Answer 1

所以，如果我理解正确的话，这相当于有两种类型的节点：

DataFrame 中的节点，它们具有 id
DataFrame 中列和值的组合

这个DataFrame是图的边。

因此，(a, 101) 连接到 1 & 2

和 (b, 202) 连接到 2 & 3

因此，所有 1、2、3、(a, 101)、(a, 102)、(b, 201)、(b, 202) , (c, 301), 和 (c, 302) 相连。

我不熟悉 networkx，但似乎有一个名为 connected_components 的函数可以为您提供连接的子图。所以，

import pandas as pd
import networkx as nx
from StringIO import StringIO


df = pd.read_table(StringIO("""
id  a    b    c
1  101  201  301
2  101  202  302
3  102  202  302
4  103  203  303
5  103  204  304"""), delim_whitespace=True)

df = df.set_index('id')

G = nx.Graph()
for (id_, column), other_node in df.stack().iteritems():
    G.add_edge(id_, (column, other_node))

cluster_map = pd.Series(
    {id_: id_cluster + 1
     for id_cluster, ids in enumerate(nx.connected_components(G))
     for id_ in ids
     if not isinstance(id_, tuple)},
    name='id_cluster')

df = df.join(cluster_map)
print(df)

产量

      a    b    c  id_cluster
id                           
1   101  201  301           1
2   101  202  302           1
3   102  202  302           1
4   103  203  303           2
5   103  204  304           2

网络集群的唯一 ID

Unique ID for network clusters

python

cluster-analysis

networkx

pandas