Pandas:在 groupby 之后重新整形/重新旋转数据框
Pandas: re-shape/ re-pivot a data frame after groupby
我在数据框的 duration
列上应用 quantile
函数:
a=df.groupby('version')[['duration']].quantile([.25, .5, .75])
a
duration
version
4229 0.25 1451.00
0.50 1451.00
0.75 1451.00
6065 0.25 213.75
0.50 426.50
0.75 639.25
9209 0.25 386.50
0.50 861.00
0.75 866.00
2304 0.25 664.50
0.50 669.00
0.75 736.50
6389 0.25 1.00
0.50 797.00
0.75 832.00
我想知道如何 re-shape/re-pivot 上面的数据框,所以新的数据框(是的,它必须是数据框格式)看起来像:
version duration_Q1 duration_Q2 duration_Q3
4429 1451.00 1451.00 1451.00
6065 213.75 426.50 639.25
9209 386.50 861.00 866.00
2304 664.50 669.00 736.50
6389 1.00 797.00 832.00
谢谢!
您可以使用 unstack
,然后进行一些重命名操作
a = pd.DataFrame('duration': {(2304L, 0.25): 1565.6861959516361,
(2304L, 0.5): 446.4769649280514,
(2304L, 0.75): 701.8254115357969,
(4229L, 0.25): 1868.982390749203,
(4229L, 0.5): 242.36201172579996,
(4229L, 0.75): 789.482292226787,
(6065L, 0.25): 1421.9585894685038,
(6065L, 0.5): 357.04491735326343,
(6065L, 0.75): 169.78973203074895,
(6389L, 0.25): 1789.1550141153925,
(6389L, 0.5): 516.9365429825862,
(6389L, 0.75): 1830.6493228794639,
(9209L, 0.25): 1129.853279993191,
(9209L, 0.5): 1759.1258334115485,
(9209L, 0.75): 1499.0498929925702}}
)
pvt = a.unstack()
pvt.columns = pvt.columns.droplevel(0)
pvt.rename(columns={0.25:'duration_Q1',0.5:'duration_Q2',0.75:'duration_Q3'},inplace=True)
duration_Q1 duration_Q2 duration_Q3
version
2304 1565.686196 446.476965 701.825412
4229 1868.982391 242.362012 789.482292
6065 1421.958589 357.044917 169.789732
6389 1789.155014 516.936543 1830.649323
9209 1129.853280 1759.125833 1499.049893
我在数据框的 duration
列上应用 quantile
函数:
a=df.groupby('version')[['duration']].quantile([.25, .5, .75])
a
duration
version
4229 0.25 1451.00
0.50 1451.00
0.75 1451.00
6065 0.25 213.75
0.50 426.50
0.75 639.25
9209 0.25 386.50
0.50 861.00
0.75 866.00
2304 0.25 664.50
0.50 669.00
0.75 736.50
6389 0.25 1.00
0.50 797.00
0.75 832.00
我想知道如何 re-shape/re-pivot 上面的数据框,所以新的数据框(是的,它必须是数据框格式)看起来像:
version duration_Q1 duration_Q2 duration_Q3
4429 1451.00 1451.00 1451.00
6065 213.75 426.50 639.25
9209 386.50 861.00 866.00
2304 664.50 669.00 736.50
6389 1.00 797.00 832.00
谢谢!
您可以使用 unstack
,然后进行一些重命名操作
a = pd.DataFrame('duration': {(2304L, 0.25): 1565.6861959516361,
(2304L, 0.5): 446.4769649280514,
(2304L, 0.75): 701.8254115357969,
(4229L, 0.25): 1868.982390749203,
(4229L, 0.5): 242.36201172579996,
(4229L, 0.75): 789.482292226787,
(6065L, 0.25): 1421.9585894685038,
(6065L, 0.5): 357.04491735326343,
(6065L, 0.75): 169.78973203074895,
(6389L, 0.25): 1789.1550141153925,
(6389L, 0.5): 516.9365429825862,
(6389L, 0.75): 1830.6493228794639,
(9209L, 0.25): 1129.853279993191,
(9209L, 0.5): 1759.1258334115485,
(9209L, 0.75): 1499.0498929925702}}
)
pvt = a.unstack()
pvt.columns = pvt.columns.droplevel(0)
pvt.rename(columns={0.25:'duration_Q1',0.5:'duration_Q2',0.75:'duration_Q3'},inplace=True)
duration_Q1 duration_Q2 duration_Q3
version
2304 1565.686196 446.476965 701.825412
4229 1868.982391 242.362012 789.482292
6065 1421.958589 357.044917 169.789732
6389 1789.155014 516.936543 1830.649323
9209 1129.853280 1759.125833 1499.049893