NetworkX Minimum Spanning Tree 具有相同数据的不同集群排列？

Question

我有一个大型数据集，该数据集将产品与相关性度量进行比较，如下所示：

product1      product2  relatedness
0101          0102      0.047619
0101          0103      0.023810
0101          0104      0.095238
0101          0105      0.214286
0101          0106      0.047619
...           ...       ...

我使用以下代码将数据输入 NetworkX 绘图工具并生成 MST 图：

import networkx as nx
import matplotlib.pyplot as plt

products = (data['product1'])
products = list(dict.fromkeys(products))
products = sorted(products)

G = nx.Graph()
G.add_nodes_from(products)
print(G.number_of_nodes())
print(G.nodes())

row = 0
for c in data['product1']:
    p = data['product2'][row]
    w = data['relatedness'][row]
    if w > 0:
        G.add_edge(c,p, weight=w, with_labels=True)
    row = row + 1

nx.draw(nx.minimum_spanning_tree(G), with_labels=True)
plt.show()

生成的图表如下所示：https://i.imgur.com/pBbcPGc.jpg

但是，当我重新运行代码时，使用相同的数据且没有修改，集群的排列似乎发生了变化，因此看起来不同，示例如下：https://i.imgur.com/4phvFGz.jpg, second example here: https://i.imgur.com/f2YepVx.jpg .簇、边和权重似乎没有变化，但它们在图上的排列 space 每次都在变化。

是什么原因导致节点的排列每次都改变而代码或数据没有任何改变？我怎样才能重写这段代码来生成一个网络图，每次对相同的数据具有大致相同的节点和边缘排列？

Answer 1

nx.draw 方法默认使用 spring_layout (link to the doc). This layout implements the Fruchterman-Reingold force-directed algorithm 以随机初始位置开始。这就是你在反复试验中看到的这种布局效果。

如果你想“固定”位置，那么你应该明确调用 spring_layout 函数并在 pos 参数中指定初始位置。

Answer 2

为清楚起见分配G = nx.minimum_spanning_tree(G)。那么

nx.draw(G, with_labels=True)

相当于

pos = nx.spring_layout(G)
nx.draw(G, pos=pos, with_labels=True)

由于您不喜欢每次运行脚本时随机计算 pos，保持 pos 稳定的唯一方法是存储一次并检索每次重新运行后从文件中提取。您可以将此脚本以改进的方式计算 pos 之前 nx.draw(G, pos=pos, with_labels=True):

import os, json

def store(pos):
    #form of dictionary to be stored dictionary retrieved
    return {k: v.tolist() for k, v in pos.items()}
def retrieve(pos):
    #form of dictionary to be retrieved
    return {float(k): v for k, v in pos.items()}

if 'nodes.txt' in os.listdir():
    json_file = open('pos.txt').read()
    pos = retrieve(json.loads(json_file)) #retrieving dictionary from file
    print('retrieve', pos)
else:
    with open('pos.txt', 'w') as outfile:
        pos = nx.spring_layout(new_G) #calculates pos
        print('store', pos)
        json.dump(store(pos), outfile, indent=4) #records pos dictionary into file

这是一个丑陋的解决方案，因为它无条件地依赖于 pos 字典中使用的数据类型。它对我有用，但您可以定义在 store 和 retrieve

中使用的自定义项

NetworkX Minimum Spanning Tree 具有相同数据的不同集群排列？

NetworkX Minimum Spanning Tree has different cluster arrangement with the same data?

python

graph

minimum-spanning-tree

networkx