如何使用 Pandas df 在 Python 中水平旋转 csv 中的 table?
How to Pivot a table in csv horizontally in Python using Pandas df?
我有这种格式的数据 -
MonthYear HPI Div State_fips
1-1993 105.45 7 5
2-1993 105.58 7 5
3-1993 106.23 7 5
4-1993 106.63 7 5
Required Pivot Table as:
Stafips 1-1993 2-1993 3-1993 4-1993
5 105.45 105.58 106.23 106.63
(对 pandas 很陌生)
df1 = df.set_index(['State_fips', 'MonthYear'])['HPI'].unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
df1 = df.pivot(index='State_fips', columns='MonthYear', values='HPI')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
但如果重复,需要与groupby
或pivot_table
聚合,mean
可以改为sum
,median
,...:
print (df)
MonthYear HPI Div State_fips
0 1-1993 105.45 7 5
1 2-1993 105.58 7 5
2 3-1993 106.23 7 5
3 4-1993 100.00 7 5 <-duplicates same 4-1993, 5
4 4-1993 200.00 7 5 <-duplicates same 4-1993, 5
df1 = df.pivot_table(index='State_fips', columns='MonthYear', values='HPI', aggfunc='mean')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
df1 = df.groupby(['State_fips', 'MonthYear'])['HPI'].mean().unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
最后如果需要从索引创建列并删除列名:
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
State_fips 1-1993 2-1993 3-1993 4-1993
0 5 105.45 105.58 106.23 150.0
我有这种格式的数据 -
MonthYear HPI Div State_fips 1-1993 105.45 7 5 2-1993 105.58 7 5 3-1993 106.23 7 5 4-1993 106.63 7 5 Required Pivot Table as: Stafips 1-1993 2-1993 3-1993 4-1993 5 105.45 105.58 106.23 106.63
(对 pandas 很陌生)
df1 = df.set_index(['State_fips', 'MonthYear'])['HPI'].unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
df1 = df.pivot(index='State_fips', columns='MonthYear', values='HPI')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 106.63
但如果重复,需要与groupby
或pivot_table
聚合,mean
可以改为sum
,median
,...:
print (df)
MonthYear HPI Div State_fips
0 1-1993 105.45 7 5
1 2-1993 105.58 7 5
2 3-1993 106.23 7 5
3 4-1993 100.00 7 5 <-duplicates same 4-1993, 5
4 4-1993 200.00 7 5 <-duplicates same 4-1993, 5
df1 = df.pivot_table(index='State_fips', columns='MonthYear', values='HPI', aggfunc='mean')
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
df1 = df.groupby(['State_fips', 'MonthYear'])['HPI'].mean().unstack()
MonthYear 1-1993 2-1993 3-1993 4-1993
State_fips
5 105.45 105.58 106.23 150.0 <- (100+200/2) = 150
最后如果需要从索引创建列并删除列名:
df1 = df1.reset_index().rename_axis(None, axis=1)
print (df1)
State_fips 1-1993 2-1993 3-1993 4-1993
0 5 105.45 105.58 106.23 150.0