Pandas pivot table 排列无聚合
Pandas pivot table arrangement no aggregation
我想在不聚合的情况下对 pandas 数据框进行透视,而不是垂直显示透视索引列,我想水平显示它。我试过 pd.pivot_table
但我没有得到我想要的。
data = {'year': [2011, 2011, 2012, 2013, 2013],
'A': [10, 21, 20, 10, 39],
'B': [12, 45, 19, 10, 39]}
df = pd.DataFrame(data)
print df
A B year
0 10 12 2011
1 21 45 2011
2 20 19 2012
3 10 10 2013
4 39 39 2013
但我想要:
year 2011 2012 2013
cols A B A B A B
0 10 12 20 19 10 10
1 21 45 NaN NaN 39 39
您可以先通过 cumcount
, then stack
with unstack
:
为新索引创建列
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year']).stack().unstack([1,2])
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
如果需要设置列名,请使用 rename_axis
(pandas
0.18.0
中的新功能):
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year'])
.stack()
.unstack([1,2])
.rename_axis(None)
.rename_axis(('year','cols'), axis=1)
print (df1)
year 2011 2012 2013
cols A B A B A B
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
另一个解决方案pivot
, but you need swap first and second level of Multiindex
in columns by swaplevel
and then sort it by sort_index
:
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.pivot(index='g', columns='year')
df1 = df1.swaplevel(0,1, axis=1).sort_index(axis=1)
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
groupby('year')
所以我可以 reset_index
得到 0
和 1
的索引值。然后做一堆清理工作。
df.groupby('year')['A', 'B'] \
.apply(lambda df: df.reset_index(drop=True)) \
.unstack(0).swaplevel(0, 1, 1).sort_index(1)
我想在不聚合的情况下对 pandas 数据框进行透视,而不是垂直显示透视索引列,我想水平显示它。我试过 pd.pivot_table
但我没有得到我想要的。
data = {'year': [2011, 2011, 2012, 2013, 2013],
'A': [10, 21, 20, 10, 39],
'B': [12, 45, 19, 10, 39]}
df = pd.DataFrame(data)
print df
A B year
0 10 12 2011
1 21 45 2011
2 20 19 2012
3 10 10 2013
4 39 39 2013
但我想要:
year 2011 2012 2013
cols A B A B A B
0 10 12 20 19 10 10
1 21 45 NaN NaN 39 39
您可以先通过 cumcount
, then stack
with unstack
:
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year']).stack().unstack([1,2])
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
如果需要设置列名,请使用 rename_axis
(pandas
0.18.0
中的新功能):
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.set_index(['g','year'])
.stack()
.unstack([1,2])
.rename_axis(None)
.rename_axis(('year','cols'), axis=1)
print (df1)
year 2011 2012 2013
cols A B A B A B
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
另一个解决方案pivot
, but you need swap first and second level of Multiindex
in columns by swaplevel
and then sort it by sort_index
:
df['g'] = df.groupby('year')['year'].cumcount()
df1 = df.pivot(index='g', columns='year')
df1 = df1.swaplevel(0,1, axis=1).sort_index(axis=1)
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
print (df1)
year 2011 2012 2013
A B A B A B
g
0 10.0 12.0 20.0 19.0 10.0 10.0
1 21.0 45.0 NaN NaN 39.0 39.0
groupby('year')
所以我可以 reset_index
得到 0
和 1
的索引值。然后做一堆清理工作。
df.groupby('year')['A', 'B'] \
.apply(lambda df: df.reset_index(drop=True)) \
.unstack(0).swaplevel(0, 1, 1).sort_index(1)