如何使用递归记录父子层次结构中的所有路由?
How to use recursion to record all routes in a parent child hierarchy?
我正在尝试遍历层次结构数据框并将每条可能的路线记录到另一个数据框中。这些路线可以有可变深度。
原始数据框 (df)。最高列表示父列中的值不是任何子列:
parent
child
highest
a
b
1
b
c
0
b
d
0
d
e
0
最终目标数据框:
level 3
level 2
level 1
level 0
a
b
c
a
b
d
e
这是我目前拥有的
def search(parent):
for i in range(df.shape[0]):
if(df.iloc[i,0] == parent):
search(df.iloc[i,1])
for i in range(df.shape[0]):
if(df.iloc[i,2] == 1):
search(df.iloc[i,0])
我可以遍历层次结构,但我不知道如何以我想要的格式保存它。
这可以使用 networkx 库来完成。您不需要 highest
列。从图中可以看出。
import pandas as pd
import networkx as nx
df = pd.DataFrame([ \
('a', 'b', 1), \
('b', 'c', 0), \
('b', 'd', 0), \
('d', 'e', 0)], columns=['parent', 'child', 'highest'])
parents = df.parent.values
childs = df.child
G = nx.DiGraph()
for i in range(len(parents)):
parent = parents[i]
child = childs[i]
if not parent in G.nodes:
G.add_node(parent)
if not child in G.nodes:
G.add_node(child)
G.add_edge(parent, child)
roots = [node for node in G.nodes if G.in_degree(node) == 0]
root = roots[0]
leaves = [node for node in G.nodes if G.out_degree(node) == 0]
all_paths = [list(nx.simple_paths.all_simple_paths(G, root, leaf)) for leaf in leaves]
# Flatten
all_paths = [tuple(path) for paths in all_paths for path in paths]
paths_df = pd.DataFrame(all_paths)
我将让您根据自己的喜好重命名列。
您可以使用networkx
to solve the problem. Note if you use networkx
you don't need the highest
columns. The main function to find all paths is all_simple_paths
# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx
# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
# create_using=nx.DiGraph)
# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
G.add_edge(source, target)
# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]
# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for root in roots:
for leaf in leaves:
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)
输出:
>>> out
level 3 level 2 level 1 level 0
0 a b c
1 a b d e
我正在尝试遍历层次结构数据框并将每条可能的路线记录到另一个数据框中。这些路线可以有可变深度。
原始数据框 (df)。最高列表示父列中的值不是任何子列:
parent | child | highest |
---|---|---|
a | b | 1 |
b | c | 0 |
b | d | 0 |
d | e | 0 |
最终目标数据框:
level 3 | level 2 | level 1 | level 0 |
---|---|---|---|
a | b | c | |
a | b | d | e |
这是我目前拥有的
def search(parent):
for i in range(df.shape[0]):
if(df.iloc[i,0] == parent):
search(df.iloc[i,1])
for i in range(df.shape[0]):
if(df.iloc[i,2] == 1):
search(df.iloc[i,0])
我可以遍历层次结构,但我不知道如何以我想要的格式保存它。
这可以使用 networkx 库来完成。您不需要 highest
列。从图中可以看出。
import pandas as pd
import networkx as nx
df = pd.DataFrame([ \
('a', 'b', 1), \
('b', 'c', 0), \
('b', 'd', 0), \
('d', 'e', 0)], columns=['parent', 'child', 'highest'])
parents = df.parent.values
childs = df.child
G = nx.DiGraph()
for i in range(len(parents)):
parent = parents[i]
child = childs[i]
if not parent in G.nodes:
G.add_node(parent)
if not child in G.nodes:
G.add_node(child)
G.add_edge(parent, child)
roots = [node for node in G.nodes if G.in_degree(node) == 0]
root = roots[0]
leaves = [node for node in G.nodes if G.out_degree(node) == 0]
all_paths = [list(nx.simple_paths.all_simple_paths(G, root, leaf)) for leaf in leaves]
# Flatten
all_paths = [tuple(path) for paths in all_paths for path in paths]
paths_df = pd.DataFrame(all_paths)
我将让您根据自己的喜好重命名列。
您可以使用networkx
to solve the problem. Note if you use networkx
you don't need the highest
columns. The main function to find all paths is all_simple_paths
# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx
# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
# create_using=nx.DiGraph)
# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
G.add_edge(source, target)
# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]
# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for root in roots:
for leaf in leaves:
for path in nx.all_simple_paths(G, root, leaf):
paths.append(path)
# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)
输出:
>>> out
level 3 level 2 level 1 level 0
0 a b c
1 a b d e