从 pandas DataFrame 中读取数据并使用 python 中的任意树创建树

Read data from a pandas DataFrame and create a tree using anytree in python

有没有办法从 pandas DataFrame 中读取数据并使用 anytree 构建树?

Parent Child
A      A1
A      A2
A2     A21

我可以使用静态值来完成,如下所示。但是,我想通过使用任何树从 pandas DataFrame 读取数据来自动执行此操作。

>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)

输出是

A
├── A1
└── A2
    └── A21

这个问题特别是答案已被采纳,真正复制,来自:

非常感谢@Fabien N

详情请参考@Fabian N 在的回答。

以下是他对外部文件与 pandas DataFrame 一起使用的答案的采纳:

    df['Parent_child'] = df['Parent'] + ',' + df['child'] # column of comma separated Parent,child

    i = 0
    for index, row in df.iterrows():
        if row['child']==row['Parent']:  # I modified the DataFrame by concatenating a 
                                         # dataframe of all the roots in my data, then 
                                         # copied in into both parent and child columns.  
                                         # This can be skipped by statically setting the 
                                         # roots, only making sure the assumption 
                                         # highlighted by @Fabien in the above quoted 
                                         # answer still holds true (This assumes that the 
                                         # entries are in such an order that a parent node 
                                         # was always introduced as a child of another 
                                         # node beforehand)

            root = Node(row['Parent'])
            nodes = {}
            nodes[root.name] = root
            i=i+1
        else:
            line = row['Parent_child'].split(",")
            name = "".join(line[1:]).strip()
            nodes[name] = Node(name, parent=nodes[line[0]])
            #predecessor = df['child_Parent'].values[i]
            i=i+1
                
    for pre, _, node in RenderTree(root):
        print("%s%s" % (pre, node.name))

如果有更好的方法实现上述目标,请post回答,我会采纳作为解决方案。

非常感谢@Fabian N.

如果不存在,首先创建节点,将它们的引用存储在字典中nodes以供进一步使用。必要时将 parent 更改为 children。我们可以通过查看 Parent 值不在 Child 值中来推导出森林的根,因为 parent 不是任何节点的 children 它不会出现在 Child 列。

def add_nodes(nodes, parent, child):
    if parent not in nodes:
        nodes[parent] = Node(parent)  
    if child not in nodes:
        nodes[child] = Node(child)
    nodes[child].parent = nodes[parent]

data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {}  # store references to created nodes 
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1)  # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
    add_nodes(nodes, parent, child)

roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots:         # you can skip this for roots[0], if there is no forest and just 1 tree
    for pre, _, node in RenderTree(nodes[root]):
        print("%s%s" % (pre, node.name))

结果:

A
├── A1
└── A2
    └── A21
B
└── B1

更新打印特定根:

root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
    print("%s%s" % (pre, node.name))