为每个节点 igraph 有效地提取具有最大权重的边

Efficiently extract edges with biggest weight for each node igraph

我有一个加权无向简单图。
节点的数量在数万左右,边的数量也是如此(它是一个稀疏图/矩阵)。

对于每个节点,我想找到最大边权重(“最大分数”)并将其与共享该边的节点一起存储在数据框中。数据框将包含三列:node_name - str,max_score - float [0-1],max_score_nodes - List[str]

下面提供了我当前的解决方案,但它并不优雅,for 循环中有多个列表理解(其中一个是嵌套的),检查没有边缘的节点等,我觉得有一种更聪明的方法这个。

import string
import igraph as ig
import numpy as np
import pandas as pd

nodes = list(string.ascii_letters[0:6].upper())
edges = [("A", "F"), ("A", "C"), ("F", "D"), ("D", "C"), ("D", "E")]
weights = [0.6, 0.4, 0.3, 0.9, 0.9]

w_graph = ig.Graph(directed=False)
w_graph.add_vertices(nodes)
w_graph.add_edges(edges, {"weight": weights})

records = {}
for node in nodes:
    local_edges = np.array(w_graph.vs.find(node).all_edges())
    if local_edges.size == 0:
        records.update({node: {"max_score": 0, "max_score_nodes": np.nan}})
        continue
    local_weights = [local_edge["weight"] for local_edge in local_edges]

    max_score = np.max(local_weights)
    max_score_ind = np.where(local_weights == max_score)[0]
    max_score_edges = local_edges[max_score_ind]

    vertex_tuples = [edge.vertex_tuple for edge in max_score_edges]
    max_score_nodes = [
        [vertex["name"] for vertex in vertex_tuple if vertex["name"] != node][0] for
        vertex_tuple in vertex_tuples]

    records.update({node: {"max_score": max_score, "max_score_nodes": max_score_nodes}})

output = pd.DataFrame.from_dict(records, orient="index")
output_with_node_name = output.rename_axis("node_name").reset_index()
print(output_with_node_name)

我发现 SciPy 的 csr matrix 可以有效地处理压缩行格式的稀疏矩阵。

有一个 igraph method,它 return 将图形的邻接矩阵作为 CSR 矩阵。

从这里我们可以在行上使用向量化操作,因此整个嵌套 for 循环现在减少为 4 行:

scores_matrix = w_graph.get_adjacency_sparse(attribute="weight")

node_names = w_graph.vs["name"]
max_scores = scores_matrix.max(axis=0).toarray()[0]
max_score_nodes = w_graph.vs[scores_matrix.argmax(axis=0).tolist()[0]]["name"]
max_score_nodes = np.where(max_scores == 0, np.nan, max_score_nodes)

output = pd.DataFrame({"node_name": node_names,
                       "max_score": max_scores,
                       "max_score_nodes": max_score_nodes})
print(output)

问题是 argmax 将 return 只有一个索引,如果你想拥有所有 node_names 和 max_score 你可以像这样循环:

from scipy.sparse import find

max_score_nodes = []
for node_index in w_graph.vs.indices:
    max_score = max_scores[node_index]
    if max_score == 0:
        max_score_nodes.append(np.nan)
        continue

    max_score_node_ind = find(scores_matrix.getrow(node_index) == max_score)[1]
    max_score_node_names = w_graph.vs[max_score_node_ind.tolist()]["name"]
    max_score_nodes.append(max_score_node_names)

output["max_score_nodes"] = max_score_nodes
print(output)

您可以像这样构建图表:

import igraph as ig
from igraph import Graph
g = Graph.Formula('A, B, C, D, E, F, A-F, A-C, F-D-C, D-E')
g.es['weight'] = [0.6, 0.4, 0.3, 0.9, 0.9]

获取每个顶点的最高权重边的索引:

strongest_edges = [max(edges, key=g.es['weight'].__getitem__) for edges in g.get_inclist()]

获取每条边的端点:

[g.es[eid].tuple for eid in strongest_edges]