当只能通过遍历发现数据结构时,如何使用 Python 枚举有向无环图结构中的所有路径?

How do I enumerate all paths in a Directed Acyclic Graph structure using Python, when the data structure can only be discovered via traversal?

请允许我在开始之前澄清一些细节:

  1. 为了自己研究这个问题,我真的很努力。我看过 this or this 之类的解决方案,但 所有这些解决方案都假设您在内存中拥有整个数据结构
  2. 这是一些实用的东西,不是家庭作业或工作面试问题。我试图想出一个解决方案,但一直无法解决,尽管我知道这在理论上是可行的。
  3. 细节:我正在使用 python/Selenium 遍历 Twine 文件,但是 我展示的是通用伪代码接口 因为这些细节对于解决问题并不重要这个问题,只是为了证明我需要遍历路径才能发现它。

所以,我有一个包含大约 150 个节点的有向无环图。有一个起始节点和多个结束节点。

假设我有一个类似这样的界面来遍历图:

class TheGraphToTraverse:

    def get_the_first_node(self):
        pass

    def get_the_list_of_edges_from_the_current_node(self):
        """
        Returns list of links/edges that can be followed
        """

    def follow_this_edge(self, edge):
        """
        Follow an edge link to its destination node
        """

    def go_back_one_step(self):
        """
        Navigate back one step
        """

    def go_back_to_the_start_node(self):
        """
        Start over from the beginning node
        """

    def is_this_node_an_end_node(self):
        """
        Is the current node an end node with no further edges to traverse?
        """

我将如何调整(递归或非递归)算法,如 these 但不需要将图形本身传递给递归函数:

data = {1 : [2,3],   # Directed acyclic graph adjacency list
        2 : [3],
        3 : [4,5],
        4 : [5]}

def dfs(data, path, paths = []):   
    datum = path[-1]              
    if datum in data:
        for val in data[datum]:
            new_path = path + [val]
            paths = dfs(data, new_path, paths)
    else:
        paths += [path]
    return paths

看看这个算法(以及我在网上能找到的所有算法)是如何让你传递整个数据结构的?我没有图作为数据结构,需要在遍历过程中发现它,所以我需要能够在当前路径上保持某种状态,但我无法清楚地看到如何可靠地存储该状态.

我已经走到这一步了:

    list_of_all_paths = []
    g = TheGraphToTraverse()
    first_node = g.get_the_first_node()
    list_of_all_paths = dfs(g, first_node, list_of_all_paths)

    def dfs(g: TheGraphToTraverse, path, all_paths):

        # check for end state
        if g.is_this_node_an_end_node:
            all_paths += [path]
            g.go_back_one_step()
            # I recognize I'm not sure how to store 
            # the previous edges I've stored for this path
            all_paths = dfs(g, path, all_paths)
        else:
            edges = g.get_the_list_of_edges_from_the_current_node()
            # how do I know which of these I've already tried?
            # how do I know if this is a list of 
            # edges I've already tried?
            for edge in edges:
                g.follow_this_edge(edge)
                all_paths = dfs(g, path, all_paths)
        return all_paths

任何帮助或改编的算法,即使它不在 Python 中或不是递归的都可以。非常感谢您参与其中。

您目前没有公开获取除第一个节点以外的任何节点的方法,所以我假设您只是在寻找边?

我写了一个生成器,因为我发现在递归函数中使用列表往往会导致列表在不应该发生突变时发生突变,反之亦然。出于同样的原因,我使用元组而不是边序列列表。

def dfs(g: TheGraphToTraverse, current_path=()):
    '''Depth first search on an unbuilt graph. ``current_path`` is a tuple of edges, forming a single traversal path.

    Returns an iterator of tuples of edges.
    '''
    if g.is_this_node_an_end_node():
        yield current_path
    else:
        edges = g.get_the_list_of_edges_from_the_current_node()
        for edge in edges:
            # A "sandwich" of graph traversal management with the recursion in between.
            # This ensures that every follow_this_edge is paired with a go_back_one_step.
            g.follow_this_edge(edge)
            yield from dfs(g, current_path + (edge,))
            g.go_back_one_step()

g = TheGraphToTraverse()
list_of_all_paths = list(dfs(g))  # type: list[tuple[edge_type, ...]]

根据您对问题的描述和评论讨论,我非常有信心您只需要以下抽象:

def get_edges_from(node):
    # examine the current node to determine where we can go.
    # returns a list of graph edges to follow.
    # If the logic for following an edge requires the current node,
    # then include that information in the 'edge' representation.

def get_node(edge):
    # follow an edge and return the node at the other end.

我们再看一下原代码:

def dfs(data, path, paths = []):   
    datum = path[-1]              
    if datum in data:
        for val in data[datum]:
            new_path = path + [val]
            paths = dfs(data, new_path, paths)
    else:
        paths += [path]
    return paths

这里,data是图的表示,它允许一种简单的方法直接查找后继节点:data[datum]datum 是我们递归中的“当前”节点(我们不需要任何工具来从外部跟踪它,因为它隐含在递归过程中;我们不需要“返回”的方法,因为递归也处理)。我们已经将该过程抽象为在节点上运行的全局函数,因此我们不需要将其传入。相反,我们只需替换查找。

另外,我们不需要特殊的方式来告诉我们节点是否是叶子;我们只是检查我们是否真的从那里递归。现有代码检查 if datum in data: 来执行此操作(如果结构包含后继节点的空列表,这将导致错误)。我们将在本地使用标志逻辑。

最后,我想清理 returnpaths 值的逻辑,因为它已经被变异以存储整体结果。我还删除了默认值,因为可变默认值 are bad.

应用这些更改以及一些外观名称更改,可以得到:

def dfs(path, paths):   
    current_node = path[-1]
    is_leaf = True
    for edge in get_edges_from(current_node):
        new_path = path + [get_node(edge)]
        dfs(data, new_path, paths)
        is_leaf = False
    if is_leaf:
        paths.append(path)

然后我们可以添加一个助手来启动递归:

def dfs_from(root):
    paths = []
    dfs([root], paths) # populates the `paths` list
    return paths

当然可以改进这一点。当前算法不缓存连接结果:也就是说,如果有多种方式到达图中间的节点,算法将重复遍历从该节点向前的每条路径,对于通向到的每条路径它。如果我从头开始,我肯定也会采用递归生成器方法,而不是构建递归传递的列表。我只是想展示实际需要的修改有多么少。