查找 children 的深度及其从 Pandas 数据框中的 parent 元素发出的 children

Find the depth of children and their children emanating from a parent element in Pandas dataframe

我有一个包含 parent_idchild_id 列的数据框。数据框的片段如下:

df = pd.DataFrame({'child_id': [11, 12, 17, 18, 19, 20, 21, 22, 32, 33, 34, 35, 41, 42, 43, 44, 44, 56, 57, 58, 59, 63, 64, 65, 66, 67, 78, 79, 80, 81],
                   'parent_id': [10, 11, 16, 17, 18, 19, 20, 21, 17, 32, 33, 17, 33, 41, 42, 43, 42, 44, 56, 57, 57, 59, 63, 64, 65, 65, 67, 78, 79, 79],
                  })

   child_id  parent_id
0        11         10
1        12         11
2        17         16
3        18         17
4        19         18
...

我想得到一个数组,显示每个 parent_id,children(推出)走了多少步。

例如,对于parent_id10,返回的数组中会有两个元素。第一个元素的值为 2,因为它有 child_id 11,然后 11 有 12。10 也开始另一个 rollout child_id 20,深入 3 步。因此,对于 10,返回的数组应具有 2 个元素 [2, 3]

13 点的下一个展示是极端案例。它产生 4 children 的 rollout,其中一个 (child_id: 33) 进一步产生 4 children,其中一个产生 (child_id: 42) 进一步 3 . 其中之一 (child_id: 57) 启动了另外两个部署。一个 rollout 有 1 child,另一个有 6 children。所以对于 child_id 13,我应该得到 [12, 17].

的展开深度

返回的数组总数应该[2, 3, 12, 17]。我正在努力将极端情况包含在解决方案中。

这是一道图形题。

您的图表如下所示:

如果我没理解错的话,你想知道从给定节点到终点的所有可能路径的长度。

例如,取图的一个子集。从 65 开始有 3 条可能的路径,长度为 1(到 66)、4(到 80)和 4(到 81)。

您可以使用networkx 来计算图形,然后使用set 操作将终端节点确定为没有子节点的节点。从每个终端节点开始,递归返回并计算每个给定节点的边,保存字典中的距离(defaultdict):

# set up graph
import networkx as nx

G = nx.from_pandas_edgelist(df, source='parent_id', target='child_id',
                            create_using=nx.DiGraph)

# compute paths
from collections import defaultdict
successors = defaultdict(list)
def get_parent(n, num=0):
    for p in G.predecessors(n):
        successors[p].append(num+1)
        get_parent(p, num+1)

parents = set(df['parent_id'])
children = set(df['child_id'])
terminal = children-parents

for t in terminal:
    get_parent(t)

print(dict(successors))

输出:

{33: [1, 11, 10, 14, 13, 14, 13, 7, 6],
 32: [2, 12, 11, 15, 14, 15, 14, 8, 7],
 17: [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8],
 16: [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9],
 65: [1, 4, 4],
 64: [2, 5, 5],
 63: [3, 6, 6],
 59: [4, 7, 7],
 57: [5, 8, 8, 1],
 56: [6, 9, 9, 2],
 44: [7, 10, 10, 3],
 43: [8, 11, 11, 4],
 42: [9, 8, 12, 11, 12, 11, 5, 4],
 41: [10, 9, 13, 12, 13, 12, 6, 5],
 11: [1],
 10: [2],
 79: [1, 1],
 78: [2, 2],
 67: [3, 3],
 21: [1],
 20: [2],
 19: [3],
 18: [4]}

最后,合并到您的原始数据框:

df.merge(pd.Series(successors, name='paths'),
         left_on='parent_id', right_index=True)

输出:

    child_id  parent_id                                     paths
0         11         10                                       [2]
1         12         11                                       [1]
2         17         16  [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9]
3         18         17   [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
8         32         17   [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
11        35         17   [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
4         19         18                                       [4]
5         20         19                                       [3]
6         21         20                                       [2]
7         22         21                                       [1]
9         33         32         [2, 12, 11, 15, 14, 15, 14, 8, 7]
10        34         33         [1, 11, 10, 14, 13, 14, 13, 7, 6]
12        41         33         [1, 11, 10, 14, 13, 14, 13, 7, 6]
13        42         41             [10, 9, 13, 12, 13, 12, 6, 5]
14        43         42              [9, 8, 12, 11, 12, 11, 5, 4]
16        44         42              [9, 8, 12, 11, 12, 11, 5, 4]
15        44         43                            [8, 11, 11, 4]
17        56         44                            [7, 10, 10, 3]
18        57         56                              [6, 9, 9, 2]
19        58         57                              [5, 8, 8, 1]
20        59         57                              [5, 8, 8, 1]
21        63         59                                 [4, 7, 7]
22        64         63                                 [3, 6, 6]
23        65         64                                 [2, 5, 5]
24        66         65                                 [1, 4, 4]
25        67         65                                 [1, 4, 4]
26        78         67                                    [3, 3]
27        79         78                                    [2, 2]
28        80         79                                    [1, 1]
29        81         79                                    [1, 1]
你的 duplicated question
结果

图表:

输出:

    child_id  parent_id   paths
0         11         10  [2, 3]
5         20         10  [2, 3]
1         12         11     [1]
2         17         16     [3]
3         18         17     [2]
4         19         18     [1]
6         21         20     [2]
7         22         21     [1]
8         32         13  [4, 8]
9         33         32  [3, 7]
10        34         33  [2, 6]
12        41         33  [2, 6]
11        35         34     [1]
13        42         41     [5]
14        43         42     [4]
15        44         43     [3]
16        56         44     [2]
17        57         56     [1]