查找 children 的深度及其从 Pandas 数据框中的 parent 元素发出的 children
Find the depth of children and their children emanating from a parent element in Pandas dataframe
我有一个包含 parent_id
和 child_id
列的数据框。数据框的片段如下:
df = pd.DataFrame({'child_id': [11, 12, 17, 18, 19, 20, 21, 22, 32, 33, 34, 35, 41, 42, 43, 44, 44, 56, 57, 58, 59, 63, 64, 65, 66, 67, 78, 79, 80, 81],
'parent_id': [10, 11, 16, 17, 18, 19, 20, 21, 17, 32, 33, 17, 33, 41, 42, 43, 42, 44, 56, 57, 57, 59, 63, 64, 65, 65, 67, 78, 79, 79],
})
child_id parent_id
0 11 10
1 12 11
2 17 16
3 18 17
4 19 18
...
我想得到一个数组,显示每个 parent_id
,children(推出)走了多少步。
例如,对于parent_id
10,返回的数组中会有两个元素。第一个元素的值为 2,因为它有 child_id
11,然后 11 有 12。10 也开始另一个 rollout child_id
20,深入 3 步。因此,对于 10,返回的数组应具有 2 个元素 [2, 3]
13 点的下一个展示是极端案例。它产生 4 children 的 rollout,其中一个 (child_id
: 33) 进一步产生 4 children,其中一个产生 (child_id
: 42) 进一步 3 . 其中之一 (child_id
: 57) 启动了另外两个部署。一个 rollout 有 1 child,另一个有 6 children。所以对于 child_id
13,我应该得到 [12, 17]
.
的展开深度
返回的数组总数应该[2, 3, 12, 17]
。我正在努力将极端情况包含在解决方案中。
这是一道图形题。
您的图表如下所示:
如果我没理解错的话,你想知道从给定节点到终点的所有可能路径的长度。
例如,取图的一个子集。从 65 开始有 3 条可能的路径,长度为 1(到 66)、4(到 80)和 4(到 81)。
您可以使用networkx
来计算图形,然后使用set
操作将终端节点确定为没有子节点的节点。从每个终端节点开始,递归返回并计算每个给定节点的边,保存字典中的距离(defaultdict
):
# set up graph
import networkx as nx
G = nx.from_pandas_edgelist(df, source='parent_id', target='child_id',
create_using=nx.DiGraph)
# compute paths
from collections import defaultdict
successors = defaultdict(list)
def get_parent(n, num=0):
for p in G.predecessors(n):
successors[p].append(num+1)
get_parent(p, num+1)
parents = set(df['parent_id'])
children = set(df['child_id'])
terminal = children-parents
for t in terminal:
get_parent(t)
print(dict(successors))
输出:
{33: [1, 11, 10, 14, 13, 14, 13, 7, 6],
32: [2, 12, 11, 15, 14, 15, 14, 8, 7],
17: [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8],
16: [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9],
65: [1, 4, 4],
64: [2, 5, 5],
63: [3, 6, 6],
59: [4, 7, 7],
57: [5, 8, 8, 1],
56: [6, 9, 9, 2],
44: [7, 10, 10, 3],
43: [8, 11, 11, 4],
42: [9, 8, 12, 11, 12, 11, 5, 4],
41: [10, 9, 13, 12, 13, 12, 6, 5],
11: [1],
10: [2],
79: [1, 1],
78: [2, 2],
67: [3, 3],
21: [1],
20: [2],
19: [3],
18: [4]}
最后,合并到您的原始数据框:
df.merge(pd.Series(successors, name='paths'),
left_on='parent_id', right_index=True)
输出:
child_id parent_id paths
0 11 10 [2]
1 12 11 [1]
2 17 16 [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9]
3 18 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
8 32 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
11 35 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
4 19 18 [4]
5 20 19 [3]
6 21 20 [2]
7 22 21 [1]
9 33 32 [2, 12, 11, 15, 14, 15, 14, 8, 7]
10 34 33 [1, 11, 10, 14, 13, 14, 13, 7, 6]
12 41 33 [1, 11, 10, 14, 13, 14, 13, 7, 6]
13 42 41 [10, 9, 13, 12, 13, 12, 6, 5]
14 43 42 [9, 8, 12, 11, 12, 11, 5, 4]
16 44 42 [9, 8, 12, 11, 12, 11, 5, 4]
15 44 43 [8, 11, 11, 4]
17 56 44 [7, 10, 10, 3]
18 57 56 [6, 9, 9, 2]
19 58 57 [5, 8, 8, 1]
20 59 57 [5, 8, 8, 1]
21 63 59 [4, 7, 7]
22 64 63 [3, 6, 6]
23 65 64 [2, 5, 5]
24 66 65 [1, 4, 4]
25 67 65 [1, 4, 4]
26 78 67 [3, 3]
27 79 78 [2, 2]
28 80 79 [1, 1]
29 81 79 [1, 1]
你的 duplicated question
结果
图表:
输出:
child_id parent_id paths
0 11 10 [2, 3]
5 20 10 [2, 3]
1 12 11 [1]
2 17 16 [3]
3 18 17 [2]
4 19 18 [1]
6 21 20 [2]
7 22 21 [1]
8 32 13 [4, 8]
9 33 32 [3, 7]
10 34 33 [2, 6]
12 41 33 [2, 6]
11 35 34 [1]
13 42 41 [5]
14 43 42 [4]
15 44 43 [3]
16 56 44 [2]
17 57 56 [1]
我有一个包含 parent_id
和 child_id
列的数据框。数据框的片段如下:
df = pd.DataFrame({'child_id': [11, 12, 17, 18, 19, 20, 21, 22, 32, 33, 34, 35, 41, 42, 43, 44, 44, 56, 57, 58, 59, 63, 64, 65, 66, 67, 78, 79, 80, 81],
'parent_id': [10, 11, 16, 17, 18, 19, 20, 21, 17, 32, 33, 17, 33, 41, 42, 43, 42, 44, 56, 57, 57, 59, 63, 64, 65, 65, 67, 78, 79, 79],
})
child_id parent_id
0 11 10
1 12 11
2 17 16
3 18 17
4 19 18
...
我想得到一个数组,显示每个 parent_id
,children(推出)走了多少步。
例如,对于parent_id
10,返回的数组中会有两个元素。第一个元素的值为 2,因为它有 child_id
11,然后 11 有 12。10 也开始另一个 rollout child_id
20,深入 3 步。因此,对于 10,返回的数组应具有 2 个元素 [2, 3]
13 点的下一个展示是极端案例。它产生 4 children 的 rollout,其中一个 (child_id
: 33) 进一步产生 4 children,其中一个产生 (child_id
: 42) 进一步 3 . 其中之一 (child_id
: 57) 启动了另外两个部署。一个 rollout 有 1 child,另一个有 6 children。所以对于 child_id
13,我应该得到 [12, 17]
.
返回的数组总数应该[2, 3, 12, 17]
。我正在努力将极端情况包含在解决方案中。
这是一道图形题。
您的图表如下所示:
如果我没理解错的话,你想知道从给定节点到终点的所有可能路径的长度。
例如,取图的一个子集。从 65 开始有 3 条可能的路径,长度为 1(到 66)、4(到 80)和 4(到 81)。
您可以使用networkx
来计算图形,然后使用set
操作将终端节点确定为没有子节点的节点。从每个终端节点开始,递归返回并计算每个给定节点的边,保存字典中的距离(defaultdict
):
# set up graph
import networkx as nx
G = nx.from_pandas_edgelist(df, source='parent_id', target='child_id',
create_using=nx.DiGraph)
# compute paths
from collections import defaultdict
successors = defaultdict(list)
def get_parent(n, num=0):
for p in G.predecessors(n):
successors[p].append(num+1)
get_parent(p, num+1)
parents = set(df['parent_id'])
children = set(df['child_id'])
terminal = children-parents
for t in terminal:
get_parent(t)
print(dict(successors))
输出:
{33: [1, 11, 10, 14, 13, 14, 13, 7, 6],
32: [2, 12, 11, 15, 14, 15, 14, 8, 7],
17: [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8],
16: [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9],
65: [1, 4, 4],
64: [2, 5, 5],
63: [3, 6, 6],
59: [4, 7, 7],
57: [5, 8, 8, 1],
56: [6, 9, 9, 2],
44: [7, 10, 10, 3],
43: [8, 11, 11, 4],
42: [9, 8, 12, 11, 12, 11, 5, 4],
41: [10, 9, 13, 12, 13, 12, 6, 5],
11: [1],
10: [2],
79: [1, 1],
78: [2, 2],
67: [3, 3],
21: [1],
20: [2],
19: [3],
18: [4]}
最后,合并到您的原始数据框:
df.merge(pd.Series(successors, name='paths'),
left_on='parent_id', right_index=True)
输出:
child_id parent_id paths
0 11 10 [2]
1 12 11 [1]
2 17 16 [4, 2, 14, 13, 17, 16, 17, 16, 6, 10, 9]
3 18 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
8 32 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
11 35 17 [3, 1, 13, 12, 16, 15, 16, 15, 5, 9, 8]
4 19 18 [4]
5 20 19 [3]
6 21 20 [2]
7 22 21 [1]
9 33 32 [2, 12, 11, 15, 14, 15, 14, 8, 7]
10 34 33 [1, 11, 10, 14, 13, 14, 13, 7, 6]
12 41 33 [1, 11, 10, 14, 13, 14, 13, 7, 6]
13 42 41 [10, 9, 13, 12, 13, 12, 6, 5]
14 43 42 [9, 8, 12, 11, 12, 11, 5, 4]
16 44 42 [9, 8, 12, 11, 12, 11, 5, 4]
15 44 43 [8, 11, 11, 4]
17 56 44 [7, 10, 10, 3]
18 57 56 [6, 9, 9, 2]
19 58 57 [5, 8, 8, 1]
20 59 57 [5, 8, 8, 1]
21 63 59 [4, 7, 7]
22 64 63 [3, 6, 6]
23 65 64 [2, 5, 5]
24 66 65 [1, 4, 4]
25 67 65 [1, 4, 4]
26 78 67 [3, 3]
27 79 78 [2, 2]
28 80 79 [1, 1]
29 81 79 [1, 1]
你的 duplicated question
结果图表:
输出:
child_id parent_id paths
0 11 10 [2, 3]
5 20 10 [2, 3]
1 12 11 [1]
2 17 16 [3]
3 18 17 [2]
4 19 18 [1]
6 21 20 [2]
7 22 21 [1]
8 32 13 [4, 8]
9 33 32 [3, 7]
10 34 33 [2, 6]
12 41 33 [2, 6]
11 35 34 [1]
13 42 41 [5]
14 43 42 [4]
15 44 43 [3]
16 56 44 [2]
17 57 56 [1]