将数组值打印到它们自己的数据框行
Print array values to their own dataframe row
这是我的数据框:
al
0 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
1 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
2 [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
3 [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
4 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
... ...
43234 [5, 7, 4, [3, 8, 9], [1, 8, 10], [7, 9, 10], [...
43235 [5, 4, 6, [2, 7, 8], [1, 9, 10], [3, 7, 8, 9, ...
43236 [6, 4, 5, [2, 7, 8], [3, 6, 7, 8], [1, 5], [4,...
43237 [4, 6, 5, [1, 7, 8], [3, 6, 7, 8], [2, 5], [4,...
每行是一个图,每行的长度是节点数,每个数组或奇异值(如第 43234 行)是节点目标。我想创建一个单独的 df,如下所示:
graph_id src dst
0 0 12
0 0 13
0 1 12
0 1 14
0 2 12
0 2 15
0 3 12
.
.
.
43234 0 5
43234 1 7
43234 2 4
43234 3 3
43234 3 8
.
.
.
我尝试了这个循环的多个版本:
for i in range(len(df['al'])):
for j in range(len(df['al'][i])):
for k in range(len(df['al'][i][j])):
df2['graph_id'] = i
df2['src'] = j
df2['dst'] = k
无济于事。如果您需要任何其他信息,请告诉我
让我们假设您的数据框结构与此类似。每个单元格都是一个列表(也适用于数组)。
# dummy data
df = pd.DataFrame({
'a1': [[np.array([10,11]), np.array([12,13]), np.array([14,15])],
[21,22,23, np.array([30,31]), np.array([32,33])]]
})
print(df)
a1
0 [[10, 11], [12, 13], [14, 15]]
1 [21, 22, 23, [30, 31], [32, 33]]
然后要得到结果,可以在第一个explode
之后使用explode
twice, and create the src column with groupby.cumcount
。以下代码中的其余部分主要是装饰性的,以适应预期的输出。
res = (
df
# create a row per item in the list of each cell
.explode('a1')
# rename the index per your expected output
.rename_axis('graph_id')
# create the column src, +1 per row within the same original row number
.assign(src=lambda x: x.groupby(level='graph_id').cumcount())
# explode the cells, when array then several row, if scalar, then stay one row
.explode('a1')
# to fit expected output names
.rename(columns={'a1':'dst'})
# graph_id becomes a column
.reset_index()
# reorder the columns per expected output
[['graph_id', 'src','dst']]
)
print(res)
graph_id src dst
0 0 0 10
1 0 0 11
2 0 1 12
3 0 1 13
4 0 2 14
5 0 2 15
6 1 0 21
7 1 1 22
8 1 2 23
9 1 3 30
10 1 3 31
11 1 4 32
12 1 4 33
如果您不确定会发生什么,我建议您注释所有命令,并一次取消注释一个命令以查看每个命令的结果。
这是我的数据框:
al
0 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
1 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
2 [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
3 [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
4 [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
... ...
43234 [5, 7, 4, [3, 8, 9], [1, 8, 10], [7, 9, 10], [...
43235 [5, 4, 6, [2, 7, 8], [1, 9, 10], [3, 7, 8, 9, ...
43236 [6, 4, 5, [2, 7, 8], [3, 6, 7, 8], [1, 5], [4,...
43237 [4, 6, 5, [1, 7, 8], [3, 6, 7, 8], [2, 5], [4,...
每行是一个图,每行的长度是节点数,每个数组或奇异值(如第 43234 行)是节点目标。我想创建一个单独的 df,如下所示:
graph_id src dst
0 0 12
0 0 13
0 1 12
0 1 14
0 2 12
0 2 15
0 3 12
.
.
.
43234 0 5
43234 1 7
43234 2 4
43234 3 3
43234 3 8
.
.
.
我尝试了这个循环的多个版本:
for i in range(len(df['al'])):
for j in range(len(df['al'][i])):
for k in range(len(df['al'][i][j])):
df2['graph_id'] = i
df2['src'] = j
df2['dst'] = k
无济于事。如果您需要任何其他信息,请告诉我
让我们假设您的数据框结构与此类似。每个单元格都是一个列表(也适用于数组)。
# dummy data
df = pd.DataFrame({
'a1': [[np.array([10,11]), np.array([12,13]), np.array([14,15])],
[21,22,23, np.array([30,31]), np.array([32,33])]]
})
print(df)
a1
0 [[10, 11], [12, 13], [14, 15]]
1 [21, 22, 23, [30, 31], [32, 33]]
然后要得到结果,可以在第一个explode
之后使用explode
twice, and create the src column with groupby.cumcount
。以下代码中的其余部分主要是装饰性的,以适应预期的输出。
res = (
df
# create a row per item in the list of each cell
.explode('a1')
# rename the index per your expected output
.rename_axis('graph_id')
# create the column src, +1 per row within the same original row number
.assign(src=lambda x: x.groupby(level='graph_id').cumcount())
# explode the cells, when array then several row, if scalar, then stay one row
.explode('a1')
# to fit expected output names
.rename(columns={'a1':'dst'})
# graph_id becomes a column
.reset_index()
# reorder the columns per expected output
[['graph_id', 'src','dst']]
)
print(res)
graph_id src dst
0 0 0 10
1 0 0 11
2 0 1 12
3 0 1 13
4 0 2 14
5 0 2 15
6 1 0 21
7 1 1 22
8 1 2 23
9 1 3 30
10 1 3 31
11 1 4 32
12 1 4 33
如果您不确定会发生什么,我建议您注释所有命令,并一次取消注释一个命令以查看每个命令的结果。