大图上的简单路径查询

Question

我有一个关于大图数据的问题。假设我们有一个包含近 1 亿条边和大约 500 万个节点的大图，在这种情况下，您所知道的最好的图挖掘平台是什么，它可以给出所有长度 <=k 的简单路径（对于 k=3,4 ,5) 任意两个给定节点之间。主要关注的是获取这些路径的速度。另一件事是图形是有向的，但我们希望程序在计算路径时忽略方向，但一旦发现这些路径，仍然 return 实际有向边。

例如：

a -> c <- d -> b 是节点 'a' 和 'b' 之间长度为 3 的有效路径。

提前致谢。

Answer 1

我建议使用 Gephi 易于处理和学习。

如果您找到它，尽管 Neo4j 将通过一些编码来满足您的要求。

Answer 2

所以这是在 networkx 中执行此操作的一种方法。大致是根据我给出的解决方案。我假设 a->b 和 a<-b 是您想要的两条不同的路径。我打算 return 这是一个列表列表。每个子列表都是路径的（有序的）边。

import networkx as nx
import itertools

def getPaths(G,source,target, maxLength, excludeSet=None):
    #print source, target, maxLength, excludeSet
    if excludeSet== None:
        excludeSet = set([source])
    else:
        excludeSet.add(source)# won't allow a path starting at source to go through source again.
    if maxLength == 0:
        excludeSet.remove(source)
        return []
    else:
        if G.has_edge(source,target):
            paths=[[(source,target)]]
        else:
            paths = []
        if G.has_edge(target,source):
            paths.append([(target,source)])
        #neighbors_iter is a big iterator that will give (neighbor,edge) for each successor of source and then for each predecessor of source.

        neighbors_iter = itertools.chain(((neighbor,(source,neighbor)) for neighbor in G.successors_iter(source) if neighbor != target),((neighbor,(neighbor,source)) for neighbor in G.predecessors_iter(source) if neighbor != target))

        #note that if a neighbor is both a predecessor and a successor, it shows up twice in this iteration.  

        paths.extend( [[edge] + path for (neighbor,edge) in neighbors_iter if neighbor not in excludeSet for path in getPaths(G,neighbor,target,maxLength-1,excludeSet)] )

        excludeSet.remove(source) #when we move back up the recursion, don't want to exclude this source any more

        return paths

G=nx.DiGraph()
G.add_edges_from([(1,2),(2,3),(1,3),(1,4),(3,4),(4,3)])

print getPaths(G,1,3,2)

>[[(1, 3)], [(1, 2), (2, 3)], [(1, 4), (4, 3)], [(1, 4), (3, 4)]]

我希望通过修改 networkx 中的 dijkstra 算法，您会得到一个更高效的算法（请注意，dijkstra 算法有一个截止点，但默认情况下它只会到达 return 最短路径, 并且它会沿着边的方向）。

这是整个 paths.extend 的另一个版本： paths.extend( [[edge] + path for (neighbor,edge) in neighbors_iter if neighbor not in excludeSet for path in getPaths(G,neighbor,target,maxLength-1,excludeSet) if len(path )>0 ] )

大图上的简单路径查询

Simple path queries on large graphs

data-mining

large-data

networkx

jung

spark-graphx