将锯齿状数组转换为 Pandas 数据帧

Convert jagged array to Pandas dataframe

我正在尝试获取如下所示的锯齿状二维列表

l = [
    [(1, 0.8656769), (2, 0.08902887), (5, 0.040293545)],
    [(1, 0.5918752), (2, 0.04440181), (4, 0.05204634), (5, 0.3066661)],
    [(1, 0.26327166), (2, 0.26078925), (4, 0.24160784), (5, 0.22958432)],
    [(2, 0.92498404), (5, 0.065140516)],
    [(1, 0.9882947)],
    [(0, 0.23412614), (1, 0.031903207), (2, 0.03044448), (3, 0.6480669), (4, 0.053342175)],
    [(0, 0.056099385), (3, 0.9084766), (5, 0.031809118)],
    [(2, 0.39833495), (4, 0.52058107), (5, 0.077259734)],
    [(0, 0.46812743), (1, 0.10643007), (3, 0.15962379), (4, 0.017917762), (5, 0.24552101)],
    [(0, 0.2556301), (1, 0.7391994)]
]

成为如下所示的数据框:

l 中,每行可能包含也可能不包含所有列。每个元组的结构如下 (column_label, cell_value)。如果该行缺少一列,其值应在数据框中设置为 0。

我试过了

topics_df = pd.DataFrame(l).fillna(0)

但这会产生如下所示的数据框:

让我们尝试格式化列表以指示可以识别哪个熊猫数据框

df = pd.DataFrame(dict(enumerate(list(map(dict,l))))).T.sort_index(axis=1).fillna(0)
Out[17]: 
          0         1         2         3         4         5
0  0.000000  0.865677  0.089029  0.000000  0.000000  0.040294
1  0.000000  0.591875  0.044402  0.000000  0.052046  0.306666
2  0.000000  0.263272  0.260789  0.000000  0.241608  0.229584
3  0.000000  0.000000  0.924984  0.000000  0.000000  0.065141
4  0.000000  0.988295  0.000000  0.000000  0.000000  0.000000
5  0.234126  0.031903  0.030444  0.648067  0.053342  0.000000
6  0.056099  0.000000  0.000000  0.908477  0.000000  0.031809
7  0.000000  0.000000  0.398335  0.000000  0.520581  0.077260
8  0.468127  0.106430  0.000000  0.159624  0.017918  0.245521
9  0.255630  0.739199  0.000000  0.000000  0.000000  0.000000

您需要将元组列表更改为字典,以便 pandas 解析它

# l = [{key: val for key, val in row} for row in l]
# df = pd.DataFrame(l).fillna(0).sort_index(axis=1)
df = pd.DataFrame([dict(row) for row in l]).fillna(0).sort_index(axis=1)

输出

          1         2         5         4         0         3
0  0.865677  0.089029  0.040294  0.000000  0.000000  0.000000
1  0.591875  0.044402  0.306666  0.052046  0.000000  0.000000
2  0.263272  0.260789  0.229584  0.241608  0.000000  0.000000
3  0.000000  0.924984  0.065141  0.000000  0.000000  0.000000
4  0.988295  0.000000  0.000000  0.000000  0.000000  0.000000
5  0.031903  0.030444  0.000000  0.053342  0.234126  0.648067
6  0.000000  0.000000  0.031809  0.000000  0.056099  0.908477
7  0.000000  0.398335  0.077260  0.520581  0.000000  0.000000
8  0.106430  0.000000  0.245521  0.017918  0.468127  0.159624
9  0.739199  0.000000  0.000000  0.000000  0.255630  0.000000