从 pandas DataFrame 中读取数据并使用 python 中的任意树创建树
Read data from a pandas DataFrame and create a tree using anytree in python
有没有办法从 pandas DataFrame 中读取数据并使用 anytree 构建树?
Parent Child
A A1
A A2
A2 A21
我可以使用静态值来完成,如下所示。但是,我想通过使用任何树从 pandas DataFrame 读取数据来自动执行此操作。
>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)
输出是
A
├── A1
└── A2
└── A21
这个问题特别是答案已被采纳,真正复制,来自:
非常感谢@Fabien N
详情请参考@Fabian N 在的回答。
以下是他对外部文件与 pandas DataFrame 一起使用的答案的采纳:
df['Parent_child'] = df['Parent'] + ',' + df['child'] # column of comma separated Parent,child
i = 0
for index, row in df.iterrows():
if row['child']==row['Parent']: # I modified the DataFrame by concatenating a
# dataframe of all the roots in my data, then
# copied in into both parent and child columns.
# This can be skipped by statically setting the
# roots, only making sure the assumption
# highlighted by @Fabien in the above quoted
# answer still holds true (This assumes that the
# entries are in such an order that a parent node
# was always introduced as a child of another
# node beforehand)
root = Node(row['Parent'])
nodes = {}
nodes[root.name] = root
i=i+1
else:
line = row['Parent_child'].split(",")
name = "".join(line[1:]).strip()
nodes[name] = Node(name, parent=nodes[line[0]])
#predecessor = df['child_Parent'].values[i]
i=i+1
for pre, _, node in RenderTree(root):
print("%s%s" % (pre, node.name))
如果有更好的方法实现上述目标,请post回答,我会采纳作为解决方案。
非常感谢@Fabian N.
如果不存在,首先创建节点,将它们的引用存储在字典中nodes
以供进一步使用。必要时将 parent 更改为 children。我们可以通过查看 Parent
值不在 Child
值中来推导出森林的根,因为 parent 不是任何节点的 children 它不会出现在 Child
列。
def add_nodes(nodes, parent, child):
if parent not in nodes:
nodes[parent] = Node(parent)
if child not in nodes:
nodes[child] = Node(child)
nodes[child].parent = nodes[parent]
data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {} # store references to created nodes
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1) # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots: # you can skip this for roots[0], if there is no forest and just 1 tree
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
结果:
A
├── A1
└── A2
└── A21
B
└── B1
更新打印特定根:
root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
有没有办法从 pandas DataFrame 中读取数据并使用 anytree 构建树?
Parent Child
A A1
A A2
A2 A21
我可以使用静态值来完成,如下所示。但是,我想通过使用任何树从 pandas DataFrame 读取数据来自动执行此操作。
>>> from anytree import Node, RenderTree
>>> A = Node("A")
>>> A1 = Node("A1", parent=A)
>>> A2 = Node("A2", parent=A)
>>> A21 = Node("A21", parent=A2)
输出是
A
├── A1
└── A2
└── A21
这个问题特别是答案已被采纳,真正复制,来自:
非常感谢@Fabien N
详情请参考@Fabian N 在
以下是他对外部文件与 pandas DataFrame 一起使用的答案的采纳:
df['Parent_child'] = df['Parent'] + ',' + df['child'] # column of comma separated Parent,child
i = 0
for index, row in df.iterrows():
if row['child']==row['Parent']: # I modified the DataFrame by concatenating a
# dataframe of all the roots in my data, then
# copied in into both parent and child columns.
# This can be skipped by statically setting the
# roots, only making sure the assumption
# highlighted by @Fabien in the above quoted
# answer still holds true (This assumes that the
# entries are in such an order that a parent node
# was always introduced as a child of another
# node beforehand)
root = Node(row['Parent'])
nodes = {}
nodes[root.name] = root
i=i+1
else:
line = row['Parent_child'].split(",")
name = "".join(line[1:]).strip()
nodes[name] = Node(name, parent=nodes[line[0]])
#predecessor = df['child_Parent'].values[i]
i=i+1
for pre, _, node in RenderTree(root):
print("%s%s" % (pre, node.name))
如果有更好的方法实现上述目标,请post回答,我会采纳作为解决方案。
非常感谢@Fabian N.
如果不存在,首先创建节点,将它们的引用存储在字典中nodes
以供进一步使用。必要时将 parent 更改为 children。我们可以通过查看 Parent
值不在 Child
值中来推导出森林的根,因为 parent 不是任何节点的 children 它不会出现在 Child
列。
def add_nodes(nodes, parent, child):
if parent not in nodes:
nodes[parent] = Node(parent)
if child not in nodes:
nodes[child] = Node(child)
nodes[child].parent = nodes[parent]
data = pd.DataFrame(columns=["Parent","Child"], data=[["A","A1"],["A","A2"],["A2","A21"],["B","B1"]])
nodes = {} # store references to created nodes
# data.apply(lambda x: add_nodes(nodes, x["Parent"], x["Child"]), axis=1) # 1-liner
for parent, child in zip(data["Parent"],data["Child"]):
add_nodes(nodes, parent, child)
roots = list(data[~data["Parent"].isin(data["Child"])]["Parent"].unique())
for root in roots: # you can skip this for roots[0], if there is no forest and just 1 tree
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))
结果:
A
├── A1
└── A2
└── A21
B
└── B1
更新打印特定根:
root = 'A' # change according to usecase
for pre, _, node in RenderTree(nodes[root]):
print("%s%s" % (pre, node.name))