在 DataFrame 上应用 groupby 以显示累积统计信息
Apply groupby on a DataFrame to display cumulative stats
假设我有一个如下所示的 DataFrame:
Bank Name House This Wk
Barc Germany 100
Barc UK 300
Barc UK 500
JPM Japan 200
JPM NYC 100
BOA LA 900
BOA LA 50
BOA LA 50
DB Italy 45
我想按银行名称分组,同时输出最大的房屋价值以及总价值...
例如,使用上面的示例将导致:
Bank Name Total House This Wk
Barc 900 UK 500
JPM 300 Japan 200
BOA 1000 LA 900
DB 45 Italy 45
本质上,它是按银行名称对 Total
进行分组,但也输出最大的贡献者 House
到总数,贡献的金额为 This Wk
.
我该怎么做呢?
In [121]: df.groupby('Bank Name', group_keys=False) \
...: .apply(lambda x: x.nlargest(1, 'This Wk').assign(Total=x['This Wk'].sum())) \
...: [['Bank Name','Total','House','This Wk']]
...:
Out[121]:
Bank Name Total House This Wk
5 BOA 1000 LA 900
2 Barc 900 UK 500
8 DB 45 Italy 45
3 JPM 300 Japan 200
您可以考虑使用 dfGroupBy.agg
函数列表 df.groupby
:
In [732]: out = df.groupby('Bank Name')['This Wk'].agg(['sum', 'idxmax', 'max'])\
.rename(columns={'sum' : 'Total', 'idxmax' : 'House', 'max' : 'This Wk'})\
.reset_index()
In [734]: out['House'] = df.loc[out['House'], 'House'].values; out
Out[734]:
Bank Name Total House This Wk
0 BOA 1000 LA 900
1 Barc 900 UK 500
2 DB 45 Italy 45
3 JPM 300 Japan 200
另一种使用 apply
的方法是
In [17]: (df.groupby('Bank Name', sort=False)
.apply(lambda x: pd.Series(
[x['This Wk'].sum(),
x.loc[x['This Wk'].idxmax(), 'House'],
x['This Wk'].max()],
index=['Total', 'House', 'This Wk']))
.reset_index())
Out[17]:
Bank Name Total House This Wk
0 Barc 900 UK 500
1 JPM 300 Japan 200
2 BOA 1000 LA 900
3 DB 45 Italy 45
假设我有一个如下所示的 DataFrame:
Bank Name House This Wk
Barc Germany 100
Barc UK 300
Barc UK 500
JPM Japan 200
JPM NYC 100
BOA LA 900
BOA LA 50
BOA LA 50
DB Italy 45
我想按银行名称分组,同时输出最大的房屋价值以及总价值...
例如,使用上面的示例将导致:
Bank Name Total House This Wk
Barc 900 UK 500
JPM 300 Japan 200
BOA 1000 LA 900
DB 45 Italy 45
本质上,它是按银行名称对 Total
进行分组,但也输出最大的贡献者 House
到总数,贡献的金额为 This Wk
.
我该怎么做呢?
In [121]: df.groupby('Bank Name', group_keys=False) \
...: .apply(lambda x: x.nlargest(1, 'This Wk').assign(Total=x['This Wk'].sum())) \
...: [['Bank Name','Total','House','This Wk']]
...:
Out[121]:
Bank Name Total House This Wk
5 BOA 1000 LA 900
2 Barc 900 UK 500
8 DB 45 Italy 45
3 JPM 300 Japan 200
您可以考虑使用 dfGroupBy.agg
函数列表 df.groupby
:
In [732]: out = df.groupby('Bank Name')['This Wk'].agg(['sum', 'idxmax', 'max'])\
.rename(columns={'sum' : 'Total', 'idxmax' : 'House', 'max' : 'This Wk'})\
.reset_index()
In [734]: out['House'] = df.loc[out['House'], 'House'].values; out
Out[734]:
Bank Name Total House This Wk
0 BOA 1000 LA 900
1 Barc 900 UK 500
2 DB 45 Italy 45
3 JPM 300 Japan 200
另一种使用 apply
的方法是
In [17]: (df.groupby('Bank Name', sort=False)
.apply(lambda x: pd.Series(
[x['This Wk'].sum(),
x.loc[x['This Wk'].idxmax(), 'House'],
x['This Wk'].max()],
index=['Total', 'House', 'This Wk']))
.reset_index())
Out[17]:
Bank Name Total House This Wk
0 Barc 900 UK 500
1 JPM 300 Japan 200
2 BOA 1000 LA 900
3 DB 45 Italy 45