递归地在 python 中定义多分支嵌套树

Question

我正在尝试在 Python 中实现一些基本的递归结构，但没有取得很大的成功。我有一棵以嵌套列表形式表示的树：

ex = ['A',
      ['A1',
       ['A11', 'tag'],
       ['A12', 'tag'],
       ['A13',
        ['A131', 'tag'],
        ['A132',
         ['A1321', 'tag'],
         ['A1322', 'tag']]]],
      ['A2', 'tag'],
      ['A3',
       ['A31',
        ['A311', 'tag'],
        ['A312', 'tag']],
       ['A32', 'tag'],
       ['A33',
        ['A331',
         ['A3311', 'tag'],
         ['A3312', 'tag']]],
       ['A34', 'tag'],
       ['A35',
        ['A351', 'tag'],
        ['A352',
         ['A3521', 'tag'],
         ['A3522', 'tag']]]],
      ['A4', 'tag']]

并且我定义了一个节点 class，它允许指定标签 'A', 'A1', ... 并添加 children。可以通过注意 children 不是列表来检索终端节点。

class Node(object):
    def __init__(self, tag, parent=None, children=[]):
        self.tag = tag
        self.parent = parent
        self.children = children
        self.is_terminal = False if isinstance(children, list) else True

    def add_child(self, node):
        if not isinstance(node, Node):
            raise ValueError("Cannot append node of type: [%s]" % type(node))
        if self.is_terminal:
            raise ValueError("Cannot append node to terminal")
        else:
            self.children.append(node)

现在我无法实现一个将 list-based 树递归转换为 Node-based 树的函数：

tree = Node(tag='A', 
            children=[Node(tag='A1',
                           children=[Node(tag='A11',
                                          children='tag'),
                                     Node(tag='A12',
                                          children='tag'),
                                     ...]),
                      ...])

到目前为止，这是我的尝试，基于这样的想法，即在嵌套列表中的每个位置，我们可能有一个终端节点，在这种情况下我们只需将它添加到根节点，或者 non-terminal ，在这种情况下，我们提取相应的根标签并递归迭代 children。当列表为空时，我们 return 将控制权交给调用者。我的感觉是编码风格可能不是最适合Python，但我想更具体地知道我缺少什么。

def is_terminal(e):
    return len(e) == 2 and type(e[0]) == str and type(e[1]) == str

def from_list(lst, root):
    lst = list(lst) # avoid mutating input list
    if not lst:
        return
    for e in lst:
        if is_terminal(e):
            tag, children = e
            print "terminal", tag, "with root", root.tag
            root.add_child(Node(tag=tag, children=children, parent=root))
        else:
            e = list(e)
            tag, children = e.pop(0), e
            print "non terminal", tag, "with root", root.tag
            root = Node(tag=tag, parent=root)
            from_list(children, root=root)

它有很多问题。例如，它失去了对最高根 'A' 的跟踪 - 即A2 获得 A1 作为 root。它还将树展平为具有 16 children 的节点，每个终端节点一个，并进入无限递归。我将不胜感激任何类型的提示。

Answer 1

我终于找到了问题所在，这部分是算法中的一个缺失点，部分是对 Python 列出工作方式的误解。

算法丢失了对最高根的跟踪，因为我没有在 else 语句中添加子根。这个新版本解决了这个问题。

else:
    e = list(e)
    tag, children = e.pop(0), e
    print "non terminal", tag, "with root", root.tag
    subroot = Node(tag=tag, parent=root)
    root.add_child(subroot) #
    from_list(children, root=subroot)

扁平化的问题实际上是我在 Node class 定义中使用 [] 作为默认参数。正如 here 所解释的那样，默认空列表仅在第一次函数调用（或在本例中为 class 实例化）时创建，而不是在每次调用函数时创建。因此，根的子列表添加了所有子子（因此产生了扁平化效果），并且每次修改根子列表时，所有子子的子列表都会被修改 - 因此无限递归。

结果更像是一个 Python-gotcha，而不是算法定义问题。

作为记录，完整更正的代码版本：

class Node(object):
    def __init__(self, tag, parent=None, children=None):
        self.tag = tag
        self.parent = parent
        self.children = [] if not children else children
        self.is_terminal = False if isinstance(self.children, list) else True

    def add_child(self, node):
        if not isinstance(node, Node):
            raise ValueError("Cannot append node of type: [%s]" % type(node))
        if self.is_terminal:
            raise ValueError("Cannot append node to terminal")
        else:
            self.children.append(node)

def is_terminal(e):
    return len(e) == 2 and type(e[0]) == str and type(e[1]) == str

def from_list(lst, root):
    lst = list(lst)
    if not lst:
        return
    for e in lst:
        if is_terminal(e):
            tag, children = e
            print "terminal", tag, "with root", root.tag
            root.add_child(Node(tag=tag, children=children, parent=root))
        else:
            e = list(e)
            tag, children = e.pop(0), e
            print "non terminal", tag, "with root", root.tag
            newroot = Node(tag=tag, parent=root)
            root.add_child(newroot)
            from_list(children, root=newroot)

你是这样称呼它的：

root = Node(tag=ex[0])
from_list(ex[1:], root=root)

递归地在 python 中定义多分支嵌套树

defining multi branching nested tree in python recursively

python

tree

recursion