将数组值打印到它们自己的数据框行

Question

这是我的数据框：

                                             al
0   [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
1   [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
2   [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
3   [[11, 12], [11, 13], [11, 14], [11, 15], [11, ...
4   [[12, 13], [12, 14], [12, 15], [12, 16], [12, ...
...     ...
43234   [5, 7, 4, [3, 8, 9], [1, 8, 10], [7, 9, 10], [...
43235   [5, 4, 6, [2, 7, 8], [1, 9, 10], [3, 7, 8, 9, ...
43236   [6, 4, 5, [2, 7, 8], [3, 6, 7, 8], [1, 5], [4,...
43237   [4, 6, 5, [1, 7, 8], [3, 6, 7, 8], [2, 5], [4,...

每行是一个图，每行的长度是节点数，每个数组或奇异值（如第 43234 行）是节点目标。我想创建一个单独的 df，如下所示：

graph_id  src   dst
       0    0    12
       0    0    13
       0    1    12
       0    1    14
       0    2    12
       0    2    15
       0    3    12
.
.
.
   43234    0     5
   43234    1     7
   43234    2     4
   43234    3     3
   43234    3     8
.
.
.

我尝试了这个循环的多个版本：

for i in range(len(df['al'])):
    for j in range(len(df['al'][i])):
        for k in range(len(df['al'][i][j])):
            df2['graph_id'] = i
            df2['src'] = j
            df2['dst'] = k

无济于事。如果您需要任何其他信息，请告诉我

Answer 1

让我们假设您的数据框结构与此类似。每个单元格都是一个列表（也适用于数组）。

# dummy data
df = pd.DataFrame({
    'a1': [[np.array([10,11]), np.array([12,13]), np.array([14,15])], 
           [21,22,23, np.array([30,31]), np.array([32,33])]]
})

print(df)
                                 a1
0    [[10, 11], [12, 13], [14, 15]]
1  [21, 22, 23, [30, 31], [32, 33]]

然后要得到结果，可以在第一个explode之后使用explode twice, and create the src column with groupby.cumcount。以下代码中的其余部分主要是装饰性的，以适应预期的输出。

res = (
    df
      # create a row per item in the list of each cell
      .explode('a1') 
       # rename the index per your expected output
      .rename_axis('graph_id')
      # create the column src, +1 per row within the same original row number
      .assign(src=lambda x: x.groupby(level='graph_id').cumcount())
      # explode the cells, when array then several row, if scalar, then stay one row
      .explode('a1')
      # to fit expected output names
      .rename(columns={'a1':'dst'})
      # graph_id becomes a column
      .reset_index()
      # reorder the columns per expected output
      [['graph_id', 'src','dst']]
)
print(res)
    graph_id  src dst
0          0    0  10
1          0    0  11
2          0    1  12
3          0    1  13
4          0    2  14
5          0    2  15
6          1    0  21
7          1    1  22
8          1    2  23
9          1    3  30
10         1    3  31
11         1    4  32
12         1    4  33

如果您不确定会发生什么，我建议您注释所有命令，并一次取消注释一个命令以查看每个命令的结果。

将数组值打印到它们自己的数据框行

Print array values to their own dataframe row

python

graph-theory

dataframe

pandas