将锯齿状数组转换为 Pandas 数据帧
Convert jagged array to Pandas dataframe
我正在尝试获取如下所示的锯齿状二维列表
l = [
[(1, 0.8656769), (2, 0.08902887), (5, 0.040293545)],
[(1, 0.5918752), (2, 0.04440181), (4, 0.05204634), (5, 0.3066661)],
[(1, 0.26327166), (2, 0.26078925), (4, 0.24160784), (5, 0.22958432)],
[(2, 0.92498404), (5, 0.065140516)],
[(1, 0.9882947)],
[(0, 0.23412614), (1, 0.031903207), (2, 0.03044448), (3, 0.6480669), (4, 0.053342175)],
[(0, 0.056099385), (3, 0.9084766), (5, 0.031809118)],
[(2, 0.39833495), (4, 0.52058107), (5, 0.077259734)],
[(0, 0.46812743), (1, 0.10643007), (3, 0.15962379), (4, 0.017917762), (5, 0.24552101)],
[(0, 0.2556301), (1, 0.7391994)]
]
成为如下所示的数据框:
在 l
中,每行可能包含也可能不包含所有列。每个元组的结构如下 (column_label, cell_value)
。如果该行缺少一列,其值应在数据框中设置为 0。
我试过了
topics_df = pd.DataFrame(l).fillna(0)
但这会产生如下所示的数据框:
让我们尝试格式化列表以指示可以识别哪个熊猫数据框
df = pd.DataFrame(dict(enumerate(list(map(dict,l))))).T.sort_index(axis=1).fillna(0)
Out[17]:
0 1 2 3 4 5
0 0.000000 0.865677 0.089029 0.000000 0.000000 0.040294
1 0.000000 0.591875 0.044402 0.000000 0.052046 0.306666
2 0.000000 0.263272 0.260789 0.000000 0.241608 0.229584
3 0.000000 0.000000 0.924984 0.000000 0.000000 0.065141
4 0.000000 0.988295 0.000000 0.000000 0.000000 0.000000
5 0.234126 0.031903 0.030444 0.648067 0.053342 0.000000
6 0.056099 0.000000 0.000000 0.908477 0.000000 0.031809
7 0.000000 0.000000 0.398335 0.000000 0.520581 0.077260
8 0.468127 0.106430 0.000000 0.159624 0.017918 0.245521
9 0.255630 0.739199 0.000000 0.000000 0.000000 0.000000
您需要将元组列表更改为字典,以便 pandas 解析它
# l = [{key: val for key, val in row} for row in l]
# df = pd.DataFrame(l).fillna(0).sort_index(axis=1)
df = pd.DataFrame([dict(row) for row in l]).fillna(0).sort_index(axis=1)
输出
1 2 5 4 0 3
0 0.865677 0.089029 0.040294 0.000000 0.000000 0.000000
1 0.591875 0.044402 0.306666 0.052046 0.000000 0.000000
2 0.263272 0.260789 0.229584 0.241608 0.000000 0.000000
3 0.000000 0.924984 0.065141 0.000000 0.000000 0.000000
4 0.988295 0.000000 0.000000 0.000000 0.000000 0.000000
5 0.031903 0.030444 0.000000 0.053342 0.234126 0.648067
6 0.000000 0.000000 0.031809 0.000000 0.056099 0.908477
7 0.000000 0.398335 0.077260 0.520581 0.000000 0.000000
8 0.106430 0.000000 0.245521 0.017918 0.468127 0.159624
9 0.739199 0.000000 0.000000 0.000000 0.255630 0.000000
我正在尝试获取如下所示的锯齿状二维列表
l = [
[(1, 0.8656769), (2, 0.08902887), (5, 0.040293545)],
[(1, 0.5918752), (2, 0.04440181), (4, 0.05204634), (5, 0.3066661)],
[(1, 0.26327166), (2, 0.26078925), (4, 0.24160784), (5, 0.22958432)],
[(2, 0.92498404), (5, 0.065140516)],
[(1, 0.9882947)],
[(0, 0.23412614), (1, 0.031903207), (2, 0.03044448), (3, 0.6480669), (4, 0.053342175)],
[(0, 0.056099385), (3, 0.9084766), (5, 0.031809118)],
[(2, 0.39833495), (4, 0.52058107), (5, 0.077259734)],
[(0, 0.46812743), (1, 0.10643007), (3, 0.15962379), (4, 0.017917762), (5, 0.24552101)],
[(0, 0.2556301), (1, 0.7391994)]
]
成为如下所示的数据框:
在 l
中,每行可能包含也可能不包含所有列。每个元组的结构如下 (column_label, cell_value)
。如果该行缺少一列,其值应在数据框中设置为 0。
我试过了
topics_df = pd.DataFrame(l).fillna(0)
但这会产生如下所示的数据框:
让我们尝试格式化列表以指示可以识别哪个熊猫数据框
df = pd.DataFrame(dict(enumerate(list(map(dict,l))))).T.sort_index(axis=1).fillna(0)
Out[17]:
0 1 2 3 4 5
0 0.000000 0.865677 0.089029 0.000000 0.000000 0.040294
1 0.000000 0.591875 0.044402 0.000000 0.052046 0.306666
2 0.000000 0.263272 0.260789 0.000000 0.241608 0.229584
3 0.000000 0.000000 0.924984 0.000000 0.000000 0.065141
4 0.000000 0.988295 0.000000 0.000000 0.000000 0.000000
5 0.234126 0.031903 0.030444 0.648067 0.053342 0.000000
6 0.056099 0.000000 0.000000 0.908477 0.000000 0.031809
7 0.000000 0.000000 0.398335 0.000000 0.520581 0.077260
8 0.468127 0.106430 0.000000 0.159624 0.017918 0.245521
9 0.255630 0.739199 0.000000 0.000000 0.000000 0.000000
您需要将元组列表更改为字典,以便 pandas 解析它
# l = [{key: val for key, val in row} for row in l]
# df = pd.DataFrame(l).fillna(0).sort_index(axis=1)
df = pd.DataFrame([dict(row) for row in l]).fillna(0).sort_index(axis=1)
输出
1 2 5 4 0 3
0 0.865677 0.089029 0.040294 0.000000 0.000000 0.000000
1 0.591875 0.044402 0.306666 0.052046 0.000000 0.000000
2 0.263272 0.260789 0.229584 0.241608 0.000000 0.000000
3 0.000000 0.924984 0.065141 0.000000 0.000000 0.000000
4 0.988295 0.000000 0.000000 0.000000 0.000000 0.000000
5 0.031903 0.030444 0.000000 0.053342 0.234126 0.648067
6 0.000000 0.000000 0.031809 0.000000 0.056099 0.908477
7 0.000000 0.398335 0.077260 0.520581 0.000000 0.000000
8 0.106430 0.000000 0.245521 0.017918 0.468127 0.159624
9 0.739199 0.000000 0.000000 0.000000 0.255630 0.000000