python 3x - 汇总 pandas df 行和列
python 3x - aggregate pandas df rows and columns
我正在将一个 csv 文件读入 pandas 数据框,该数据框具有我需要聚合的行和列的重复值(行和列)。
csv 文件如下所示:
p/q/[val]
1
1
1
2
2
2
3
3
4
4
1
85.09227753
79.70470428
14.60372257
35.94606018
38.66883087
43.14413452
62.1992569
61.9662056
47.01652908
55.35105515
1
77.67690277
72.28933716
35.8657341
10.12055206
22.38080597
35.09898376
39.91122818
48.39712524
37.95729065
42.97728348
1
71.51867676
66.13111115
38.59518433
22.38080406
11.31649399
29.02029228
43.14096069
44.00777054
41.19556427
36.96442413
2
78.38805389
73.00048828
34.14358902
35.09897995
29.02029228
13.26141262
36.20913696
52.90936279
36.04150391
41.10220718
2
87.69218445
82.30461884
62.14162445
39.91123581
43.14096451
36.20913696
15.41283798
52.42485428
53.06882477
55.80033112
2
68.89026642
63.50270844
52.83700562
45.54430771
41.67800522
48.60984421
50.78954315
13.5169096
37.65000153
36.0362854
3
71.05574036
65.66817474
37.6963768
34.8531723
41.11572266
36.43598175
55.10356522
39.11390305
11.24700832
23.63844109
3
68.75523376
63.36768723
46.03090668
43.18769836
49.4425621
45.19208527
53.05971527
39.51002502
23.63843918
11.75947094
4
71.51867676
66.13111115
38.59518433
22.38080406
11.31649399
29.02029228
43.14096069
44.00777054
41.19556427
36.96442413
4
68.89026642
63.50270844
52.83700562
45.54430771
41.67800522
48.60984421
50.78954315
13.5169096
37.65000153
36.0362854
当我将 csv 文件读入 pandas df 时,它将重复的列名转换为十进制数字,如下所示:
p/q/[val]
1
1.1
1.2
2
2.1
2.2
3
3.1
4
4.1
1
85.09227753
79.70470428
14.60372257
35.94606018
38.66883087
43.14413452
62.1992569
61.9662056
47.01652908
55.35105515
1
77.67690277
72.28933716
35.8657341
10.12055206
22.38080597
35.09898376
39.91122818
48.39712524
37.95729065
42.97728348
1
71.51867676
66.13111115
38.59518433
22.38080406
11.31649399
29.02029228
43.14096069
44.00777054
41.19556427
36.96442413
2
78.38805389
73.00048828
34.14358902
35.09897995
29.02029228
13.26141262
36.20913696
52.90936279
36.04150391
41.10220718
2
87.69218445
82.30461884
62.14162445
39.91123581
43.14096451
36.20913696
15.41283798
52.42485428
53.06882477
55.80033112
2
68.89026642
63.50270844
52.83700562
45.54430771
41.67800522
48.60984421
50.78954315
13.5169096
37.65000153
36.0362854
3
71.05574036
65.66817474
37.6963768
34.8531723
41.11572266
36.43598175
55.10356522
39.11390305
11.24700832
23.63844109
3
68.75523376
63.36768723
46.03090668
43.18769836
49.4425621
45.19208527
53.05971527
39.51002502
23.63843918
11.75947094
4
71.51867676
66.13111115
38.59518433
22.38080406
11.31649399
29.02029228
43.14096069
44.00777054
41.19556427
36.96442413
4
68.89026642
63.50270844
52.83700562
45.54430771
41.67800522
48.60984421
50.78954315
13.5169096
37.65000153
36.0362854
我需要聚合行和列,因此我的最终数据框如下所示:
p/q/[val]
1
2
3
4
1
60.1641834
27.56410641
49.93709119
43.57702446
2
66.98894882
36.94157547
36.87710746
43.28319232
3
58.76235326
41.70453707
46.69680214
17.57083988
4
60.24582545
33.09162458
37.863796
37.96156883
在 Excel 中,我可以使用以下公式分两步完成此操作:
步骤 1 - 聚合行:
步骤 2 - 聚合列:
我只是不确定如何在 python 中做到这一点。
如果你确实有相同的指数 column/row:
(df
.set_index('p/q/[val]')
.groupby(level=0).mean()
.groupby(level=0, axis=1).mean()
)
选择:
(df
.melt(id_vars='p/q/[val]')
.groupby(['p/q/[val]', 'variable'])['value'].mean()
.unstack()
)
输出:
1 2 3 4
p/q/[val]
1 60.164183 27.564106 49.937091 43.577024
2 66.988949 36.941575 36.877107 43.283192
3 58.762353 41.704537 46.696802 17.570840
4 60.245825 33.091625 37.863796 37.961569
如果列在表格 1、1.1 等中,添加 rename
步骤:
(df
.set_index('p/q/[val]')
.rename(columns=lambda x: x.rpartition('.')[0]) # or x[0] if single digits
.groupby(level=0).mean()
.groupby(level=0, axis=1).mean()
)
我正在将一个 csv 文件读入 pandas 数据框,该数据框具有我需要聚合的行和列的重复值(行和列)。
csv 文件如下所示:
p/q/[val] | 1 | 1 | 1 | 2 | 2 | 2 | 3 | 3 | 4 | 4 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 85.09227753 | 79.70470428 | 14.60372257 | 35.94606018 | 38.66883087 | 43.14413452 | 62.1992569 | 61.9662056 | 47.01652908 | 55.35105515 |
1 | 77.67690277 | 72.28933716 | 35.8657341 | 10.12055206 | 22.38080597 | 35.09898376 | 39.91122818 | 48.39712524 | 37.95729065 | 42.97728348 |
1 | 71.51867676 | 66.13111115 | 38.59518433 | 22.38080406 | 11.31649399 | 29.02029228 | 43.14096069 | 44.00777054 | 41.19556427 | 36.96442413 |
2 | 78.38805389 | 73.00048828 | 34.14358902 | 35.09897995 | 29.02029228 | 13.26141262 | 36.20913696 | 52.90936279 | 36.04150391 | 41.10220718 |
2 | 87.69218445 | 82.30461884 | 62.14162445 | 39.91123581 | 43.14096451 | 36.20913696 | 15.41283798 | 52.42485428 | 53.06882477 | 55.80033112 |
2 | 68.89026642 | 63.50270844 | 52.83700562 | 45.54430771 | 41.67800522 | 48.60984421 | 50.78954315 | 13.5169096 | 37.65000153 | 36.0362854 |
3 | 71.05574036 | 65.66817474 | 37.6963768 | 34.8531723 | 41.11572266 | 36.43598175 | 55.10356522 | 39.11390305 | 11.24700832 | 23.63844109 |
3 | 68.75523376 | 63.36768723 | 46.03090668 | 43.18769836 | 49.4425621 | 45.19208527 | 53.05971527 | 39.51002502 | 23.63843918 | 11.75947094 |
4 | 71.51867676 | 66.13111115 | 38.59518433 | 22.38080406 | 11.31649399 | 29.02029228 | 43.14096069 | 44.00777054 | 41.19556427 | 36.96442413 |
4 | 68.89026642 | 63.50270844 | 52.83700562 | 45.54430771 | 41.67800522 | 48.60984421 | 50.78954315 | 13.5169096 | 37.65000153 | 36.0362854 |
当我将 csv 文件读入 pandas df 时,它将重复的列名转换为十进制数字,如下所示:
p/q/[val] | 1 | 1.1 | 1.2 | 2 | 2.1 | 2.2 | 3 | 3.1 | 4 | 4.1 |
---|---|---|---|---|---|---|---|---|---|---|
1 | 85.09227753 | 79.70470428 | 14.60372257 | 35.94606018 | 38.66883087 | 43.14413452 | 62.1992569 | 61.9662056 | 47.01652908 | 55.35105515 |
1 | 77.67690277 | 72.28933716 | 35.8657341 | 10.12055206 | 22.38080597 | 35.09898376 | 39.91122818 | 48.39712524 | 37.95729065 | 42.97728348 |
1 | 71.51867676 | 66.13111115 | 38.59518433 | 22.38080406 | 11.31649399 | 29.02029228 | 43.14096069 | 44.00777054 | 41.19556427 | 36.96442413 |
2 | 78.38805389 | 73.00048828 | 34.14358902 | 35.09897995 | 29.02029228 | 13.26141262 | 36.20913696 | 52.90936279 | 36.04150391 | 41.10220718 |
2 | 87.69218445 | 82.30461884 | 62.14162445 | 39.91123581 | 43.14096451 | 36.20913696 | 15.41283798 | 52.42485428 | 53.06882477 | 55.80033112 |
2 | 68.89026642 | 63.50270844 | 52.83700562 | 45.54430771 | 41.67800522 | 48.60984421 | 50.78954315 | 13.5169096 | 37.65000153 | 36.0362854 |
3 | 71.05574036 | 65.66817474 | 37.6963768 | 34.8531723 | 41.11572266 | 36.43598175 | 55.10356522 | 39.11390305 | 11.24700832 | 23.63844109 |
3 | 68.75523376 | 63.36768723 | 46.03090668 | 43.18769836 | 49.4425621 | 45.19208527 | 53.05971527 | 39.51002502 | 23.63843918 | 11.75947094 |
4 | 71.51867676 | 66.13111115 | 38.59518433 | 22.38080406 | 11.31649399 | 29.02029228 | 43.14096069 | 44.00777054 | 41.19556427 | 36.96442413 |
4 | 68.89026642 | 63.50270844 | 52.83700562 | 45.54430771 | 41.67800522 | 48.60984421 | 50.78954315 | 13.5169096 | 37.65000153 | 36.0362854 |
我需要聚合行和列,因此我的最终数据框如下所示:
p/q/[val] | 1 | 2 | 3 | 4 |
---|---|---|---|---|
1 | 60.1641834 | 27.56410641 | 49.93709119 | 43.57702446 |
2 | 66.98894882 | 36.94157547 | 36.87710746 | 43.28319232 |
3 | 58.76235326 | 41.70453707 | 46.69680214 | 17.57083988 |
4 | 60.24582545 | 33.09162458 | 37.863796 | 37.96156883 |
在 Excel 中,我可以使用以下公式分两步完成此操作:
步骤 1 - 聚合行:
步骤 2 - 聚合列:
我只是不确定如何在 python 中做到这一点。
如果你确实有相同的指数 column/row:
(df
.set_index('p/q/[val]')
.groupby(level=0).mean()
.groupby(level=0, axis=1).mean()
)
选择:
(df
.melt(id_vars='p/q/[val]')
.groupby(['p/q/[val]', 'variable'])['value'].mean()
.unstack()
)
输出:
1 2 3 4
p/q/[val]
1 60.164183 27.564106 49.937091 43.577024
2 66.988949 36.941575 36.877107 43.283192
3 58.762353 41.704537 46.696802 17.570840
4 60.245825 33.091625 37.863796 37.961569
如果列在表格 1、1.1 等中,添加 rename
步骤:
(df
.set_index('p/q/[val]')
.rename(columns=lambda x: x.rpartition('.')[0]) # or x[0] if single digits
.groupby(level=0).mean()
.groupby(level=0, axis=1).mean()
)