如何在 python 中水平旋转 table
How to horizontally pivot a table in python
目前我有 table 这种格式:
Geo-id Name stat Year index
111500 Anniston-Oxford-Jacksonville AL 1991 0
111500 Anniston-Oxford-Jacksonville AL 1992 91.675
111500 Anniston-Oxford-Jacksonville AL 1993 93.8025
111500 Anniston-Oxford-Jacksonville AL 1994 96.63
111500 Anniston-Oxford-Jacksonville AL 1995 99.455
111500 Anniston-Oxford-Jacksonville AL 1996 102.4875
111500 Anniston-Oxford-Jacksonville AL 1997 109.0225
111500 Anniston-Oxford-Jacksonville AL 1998 114.7075
111500 Anniston-Oxford-Jacksonville AL 1999 116.005
112220 Auburn-Opelika AL 1992 90.695
112220 Auburn-Opelika AL 1993 94.2075
112220 Auburn-Opelika AL 1994 98.6825
112220 Auburn-Opelika AL 1995 103.3675
112220 Auburn-Opelika AL 1996 107.2725
112220 Auburn-Opelika AL 1997 111.7125
这应该转换为:
Geo-id Name 1991 1992 1993 1994 ........... 2017
111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.63
112220 Auburn-Opelika 0 90.695 94.2075 98.6825 and so on .....
保持大地水准面和名称垂直(但只重复一次,因为它是水平旋转的)。
我目前的代码:
pre_horizontal_df = pd.read_csv('database_raw.csv')
pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
print (df1)
但这不起作用。这个水平轴可以在 python df / pandas 中完成吗?
您需要添加列 Year
以通过 unstack
:
创建新列
df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
pivot_table
的另一个解决方案:
df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'],
columns='Year',
values='index',
fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
最后如果需要索引转换为列:
df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
Geo-id Name 1991 1992 1993 1994 \
0 111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300
1 112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825
1995 1996 1997 1998 1999
0 99.4550 102.4875 109.0225 114.7075 116.005
1 103.3675 107.2725 111.7125 0.0000 0.000
编辑:
如果每列不重复创建新索引和新列,可以使用 set_index
和 unstack
:
print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []
df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
目前我有 table 这种格式:
Geo-id Name stat Year index 111500 Anniston-Oxford-Jacksonville AL 1991 0 111500 Anniston-Oxford-Jacksonville AL 1992 91.675 111500 Anniston-Oxford-Jacksonville AL 1993 93.8025 111500 Anniston-Oxford-Jacksonville AL 1994 96.63 111500 Anniston-Oxford-Jacksonville AL 1995 99.455 111500 Anniston-Oxford-Jacksonville AL 1996 102.4875 111500 Anniston-Oxford-Jacksonville AL 1997 109.0225 111500 Anniston-Oxford-Jacksonville AL 1998 114.7075 111500 Anniston-Oxford-Jacksonville AL 1999 116.005 112220 Auburn-Opelika AL 1992 90.695 112220 Auburn-Opelika AL 1993 94.2075 112220 Auburn-Opelika AL 1994 98.6825 112220 Auburn-Opelika AL 1995 103.3675 112220 Auburn-Opelika AL 1996 107.2725 112220 Auburn-Opelika AL 1997 111.7125
这应该转换为:
Geo-id Name 1991 1992 1993 1994 ........... 2017 111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.63 112220 Auburn-Opelika 0 90.695 94.2075 98.6825 and so on .....
保持大地水准面和名称垂直(但只重复一次,因为它是水平旋转的)。
我目前的代码:
pre_horizontal_df = pd.read_csv('database_raw.csv')
pre_horizontal_df['period'] = pre_horizontal_df.year.astype(str)
df1 = pre_horizontal_df.groupby(['geoid', 'name'])['hpi'].mean().unstack()
print (df1)
但这不起作用。这个水平轴可以在 python df / pandas 中完成吗?
您需要添加列 Year
以通过 unstack
:
df1=pre_horizontal_df.groupby(['Geo-id','Name','Year'])['index'].mean().unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
pivot_table
的另一个解决方案:
df1 = pre_horizontal_df.pivot_table(index=['Geo-id', 'Name'],
columns='Year',
values='index',
fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000
最后如果需要索引转换为列:
df1 = df1.rename_axis(None, axis=1).reset_index()
print (df1)
Geo-id Name 1991 1992 1993 1994 \
0 111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300
1 112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825
1995 1996 1997 1998 1999
0 99.4550 102.4875 109.0225 114.7075 116.005
1 103.3675 107.2725 111.7125 0.0000 0.000
编辑:
如果每列不重复创建新索引和新列,可以使用 set_index
和 unstack
:
print (pre_horizontal_df[pre_horizontal_df.duplicated(['Geo-id','Name','Year'], keep=False)])
Empty DataFrame
Columns: [Geo-id, Name, stat, Year, index]
Index: []
df1 = pre_horizontal_df.set_index(['Geo-id', 'Name', 'Year'])['index'].unstack(fill_value=0)
print (df1)
Year 1991 1992 1993 1994 1995 \
Geo-id Name
111500 Anniston-Oxford-Jacksonville 0.0 91.675 93.8025 96.6300 99.4550
112220 Auburn-Opelika 0.0 90.695 94.2075 98.6825 103.3675
Year 1996 1997 1998 1999
Geo-id Name
111500 Anniston-Oxford-Jacksonville 102.4875 109.0225 114.7075 116.005
112220 Auburn-Opelika 107.2725 111.7125 0.0000 0.000