从 python 中的数据框形成所有可能的路径 (list/dict)

Form all possible pathways (list/dict) from dataframe in python

我正在尝试使用数据框形成所有可能的路径。我有以下数据框:

import pandas as pd 
data = {'from': ['b','c','d','e','f','g'], 'to': ['a', 'a', 'a', 'b','c','e']}
df = pd.DataFrame.from_dict(data)
print(df)

示例数据帧

现在,我想让所有可能的 pathways/chain 使用这两个 columns.Output 应该看起来像这样:

  1. e -> b -> a
  2. f -> c -> a
  3. g -> e -> b -> a

如果可能,用数字表示它们:

  1. e -> b -> a = 5,2,1
  2. f -> c -> a = 6,3,1
  3. g -> e -> b -> a = 7,5,2,1

更新:发件人字段可以包含重复条目。

networkx

的一种方式
import networkx as nx
 
G = nx.from_pandas_edgelist(df, 'from', 'to')
[[*nx.all_simple_paths(G, source=x, target='a')][0] for x in list('efg')]
[['e', 'b', 'a'], ['f', 'c', 'a'], ['g', 'e', 'b', 'a']]
# first of all let's make 'from' as index of df
df.set_index('from', inplace = True)

pth = []
def path(df, ch, res = []):
    if ch in df.index:
        path(df, df.loc[ch]['to'], res + [df.loc[ch]['to']])
    else:
        global pth
        pth = res
        return

import string    # we will use it below for get character  position in alphabet

for el in df.index:
    path(df,el,[el])
    print('->'.join(pth))

    # when you speak about indexes, looks you want to get the character index in alphabet
    # so here is my code
    print([string.ascii_lowercase.index(i)+1 for i in pth])
    print('') 

Out[1]:
b->a
[2, 1]

c->a
[3, 1]

d->a
[4, 1]

e->b->a
[5, 2, 1]

f->c->a
[6, 3, 1]

g->e->b->a
[7, 5, 2, 1]

您可以使用递归和生成器来形成路径,然后 string.ascii_lowercase 以数字方式存储结果:

data = {'from': ['b','c','d','e','f','g'], 'to': ['a', 'a', 'a', 'b','c','e']}
d = list(zip(data['from'], data['to']))
def get_paths(n, c = []):
   if n is None:
      yield from [i for k, _ in d for i in get_paths(k)]
   elif (r:=[b for a, b in d if a == n]):
      yield from [i for k in r for i in get_paths(k, c+[n])]
   else:
      yield c+[n]

result = list(get_paths(None))
#[['b', 'a'], ['c', 'a'], ['d', 'a'], ['e', 'b', 'a'], ['f', 'c', 'a'], ['g', 'e', 'b', 'a']]

然后,转换为整数:

from string import ascii_lowercase as al
new_results = [[al.index(b)+1 for b in i] for i in result]

输出:

[[2, 1], [3, 1], [4, 1], [5, 2, 1], [6, 3, 1], [7, 5, 2, 1]]

感谢您的建议。以下代码对我有用:

import networkx as nx
G = nx.from_pandas_edgelist(test,'From','To',create_using=nx.DiGraph())
all_paths = []
y = test['From'].unique()
for var in y:
 paths = nx.all_simple_paths(G,source = var ,target = 'XYZ')
 all_paths.extend(paths)
all_paths