python: 在 networkX 中查找输入到输出路径

python: finding input to output paths in networkX

编辑:现在研究如何计算每个节点 "looping paths" 的数量

正如标题所说,我正在尝试创建一个函数来计算网络中任何节点的 "signal paths" 数量。节点的信号路径是从多个输入之一到节点所属的多个输出之一的路径。我正在使用一种算法,有人已经将其称为 all_simple_paths,它是一个生成器,它 returns 从输入到输出的每条路径。

但是,即使我的代码看起来正确,我得到的结果也不正确。这是函数:

def signal_path_counter(G, inputs, outputs, node):
    c = 0
    paths = []
    for out in outputs:
        for i in inputs:
            for path in all_simple_paths(G, i, out):
                paths.append(path)
    for path in paths:
        for n in path:
            if(node == n):
                c += 1
    return c

这是输入数据:

import networkx as nx
import matplotlib.pyplot as plt
G=nx.DiGraph()
molecules = ["CD40L", "CD40", "NF-kB", "XBP1", "Pax5", "Bach2", "Irf4", "IL-4", "IL-4R", "STAT6", "AID", "Blimp1", "Bcl6", "ERK", "BCR", "STAT3", "Ag", "STAT5", "IL-21R", "IL-21", "IL-2", "IL-2R"]
Bcl6 = [("Bcl6", "Bcl6"), ("Bcl6", "Blimp1"), ("Bcl6", "Irf4")]
STAT5 = [("STAT5", "Bcl6")]
IL_2R = [("IL-2R", "STAT5")]
IL_2 = [("IL-22", "IL-2R")]
BCR = [("BCR", "ERK")]
Ag = [("Ag", "BCR")]
CD40L = [("CD40L", "CD40")]
CD40 = [("CD40", "NF-B")]
NF_B = [("NF-B", "Irf4"), ("NF-B", "AID")]
Irf4 = [("Irf4", "Bcl6"), ("Irf4", "Pax5"), ("Irf4", "Irf4"), ("Irf4", "Blimp1")]
ERK = [("ERK", "Bcl6"), ("ERK", "Blimp1"), ("ERK", "Pax5")]
STAT3 = [("STAT3", "Blimp1")]
IL_21 = [("IL-21", "IL-21R")]
IL_21R = [("IL-21R", "STAT3")]
IL_4R = [("IL-4R", "STAT6")]
STAT6 = [("STAT6", "AID"), ("STAT6", "Bcl6")]
Bach2 = [("Bach2", "Blimp1")]
IL_4 = [("IL-4", "IL-4R")]
Blimp1 = [("Blimp1", "Bcl6"), ("Blimp1", "Bach2"), ("Blimp1", "Pax5"), ("Blimp1", "AID"), ("Blimp1", "Irf4")]
Pax5 = [("Pax5", "Pax5"), ("Pax5", "AID"), ("Pax5", "Bcl6"), ("Pax5", "Bach2"), ("Pax5", "XBP1"), ("Pax5", "ERK"), ("Pax5", "Blimp1")]
edges = Bcl6 + STAT5 + IL_2R + IL_2 + BCR + Ag + CD40L + CD40 + NF_B + Irf4 + 
ERK + STAT3 + IL_21 + IL_21R + IL_4R + STAT6 + Bach2 + IL_4 + Blimp1 + Pax5
G.add_nodes_from(molecules)
G.add_edges_from(edges)
sources = ["Ag", "CD40L", "IL-2", "IL-21", "IL-4"]
targets = ["XBP1", "AID"]

输入网络的可视化表示 here

给出错误结果 0 的函数调用:

print(signal_path_counter(G, sources, targets, "IL-2R"))

你的错字在这一行:

IL_2 = [("IL-22", "IL-2R")]

应该是

IL_2 = [("IL-2", "IL-2R")]

可以对您的代码做一些事情,以使其更加 "pythonic"。使用 this approach 可以更干净地完成多个组合的迭代,这将用

替换 outi 上的循环
for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        paths.append(...)

另外,与其构建路径然后循环通过 paths 来测试节点是否在其中,不如直接进行测试而不是附加到 paths:

for input, output in itertools.product(inputs, outputs):
    for path in all_simple_paths(G, input, output):
        if node in path:
            c += 1

即使是这段代码,我认为也可以使用 Counter 使其更清晰。基本上,如果您曾经做过 variable += 1,或者在迭代时将元素附加到列表,通常有一种 "more pythonic" 的方法可以做到。

我担心这个算法对于更大的网络的扩展性如何。找到所有路径是昂贵的。从 node 开始并构建从 nodeoutputs 的所有路径以及从 inputsnode 的所有路径可能会更好。然后将每条路径转换成一个集合[转换成集合使下一步更快]。然后通过进出路径,看看它们是否有交集。如果没有,那么您就有一条通过 node.

的路径

这将显着减少您最终不得不考虑的路径数量(可能还有路径的长度)。