
How to use recursion to record all routes in a parent child hierarchy?


原始数据框 (df)。最高列表示父列中的值不是任何子列:

parent child highest
a b 1
b c 0
b d 0
d e 0


level 3 level 2 level 1 level 0
a b c
a b d e


这可以使用 networkx 库来完成。您不需要 highest 列。从图中可以看出。

您可以使用networkx to solve the problem. Note if you use networkx you don't need the highest columns. The main function to find all paths is all_simple_paths

# Python env: pip install networkx
# Anaconda env: conda install networkx
import networkx as nx

# Create network from your dataframe
#G = nx.from_pandas_edgelist(df, source='parent', target='child',
#                            create_using=nx.DiGraph)

# For older versions of networkx
G = nx.DiGraph()
for _, (source, target) in df[['parent', 'child']].iterrows():
    G.add_edge(source, target)

# Find roots of your graph (a root is a node with no input)
roots = [node for node, degree in G.in_degree() if degree == 0]

# Find leaves of your graph (a leaf is a node with no output)
leaves = [node for node, degree in G.out_degree() if degree == 0]

# Find all paths
paths = []
for root in roots:
  for leaf in leaves:
    for path in nx.all_simple_paths(G, root, leaf):

# Create a new dataframe
out = pd.DataFrame(paths).fillna('')
out.columns = reversed(out.add_prefix('level ').columns)


>>> out
  level 3 level 2 level 1 level 0
0       a       b       c        
1       a       b       d       e