分组数据框并获得总和和计数?
Group dataframe and get sum AND count?
我有一个如下所示的数据框:
Company Name Organisation Name Amount
10118 Vifor Pharma UK Ltd Welsh Assoc for Gastro & Endo 2700.00
10119 Vifor Pharma UK Ltd Welsh IBD Specialist Group, 169.00
10120 Vifor Pharma UK Ltd West Midlands AHSN 1200.00
10121 Vifor Pharma UK Ltd Whittington Hospital 63.00
10122 Vifor Pharma UK Ltd Ysbyty Gwynedd 75.93
如何对 Amount
求和并对 Organisation Name
进行计数,以获得如下所示的新数据框?
Company Name Organisation Count Amount
10118 Vifor Pharma UK Ltd 5 11000.00
我知道如何求和或计数:
df.groupby('Company Name').sum()
df.groupby('Company Name').count()
但不知道如何做到这两点!
试试这个:
In [110]: (df.groupby('Company Name')
.....: .agg({'Organisation Name':'count', 'Amount': 'sum'})
.....: .reset_index()
.....: .rename(columns={'Organisation Name':'Organisation Count'})
.....: )
Out[110]:
Company Name Amount Organisation Count
0 Vifor Pharma UK Ltd 4207.93 5
或者如果您不想重置索引:
df.groupby('Company Name')['Amount'].agg(['sum','count'])
或
df.groupby('Company Name').agg({'Amount': ['sum','count']})
演示:
In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5
In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
Amount
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5
如果您有很多列并且只有一列不同,您可以这样做:
In[1]: grouper = df.groupby('Company Name')
In[2]: res = grouper.count()
In[3]: res['Amount'] = grouper.Amount.sum()
In[4]: res
Out[4]:
Organisation Name Amount
Company Name
Vifor Pharma UK Ltd 5 4207.93
请注意,您可以根据需要重命名组织名称列。
df.groupby('Company Name').agg({'Organisation name':'count','Amount':'sum'})\
.apply(lambda x: x.sort_values(['count','sum'], ascending=False))
以防万一您想知道如何在聚合期间重命名列,以下是
的方法
pandas >= 0.25: Named Aggregation
df.groupby('Company Name')['Amount'].agg(MySum='sum', MyCount='count')
或者,
df.groupby('Company Name').agg(MySum=('Amount', 'sum'), MyCount=('Amount', 'count'))
MySum MyCount
Company Name
Vifor Pharma UK Ltd 4207.93 5
我有一个如下所示的数据框:
Company Name Organisation Name Amount
10118 Vifor Pharma UK Ltd Welsh Assoc for Gastro & Endo 2700.00
10119 Vifor Pharma UK Ltd Welsh IBD Specialist Group, 169.00
10120 Vifor Pharma UK Ltd West Midlands AHSN 1200.00
10121 Vifor Pharma UK Ltd Whittington Hospital 63.00
10122 Vifor Pharma UK Ltd Ysbyty Gwynedd 75.93
如何对 Amount
求和并对 Organisation Name
进行计数,以获得如下所示的新数据框?
Company Name Organisation Count Amount
10118 Vifor Pharma UK Ltd 5 11000.00
我知道如何求和或计数:
df.groupby('Company Name').sum()
df.groupby('Company Name').count()
但不知道如何做到这两点!
试试这个:
In [110]: (df.groupby('Company Name')
.....: .agg({'Organisation Name':'count', 'Amount': 'sum'})
.....: .reset_index()
.....: .rename(columns={'Organisation Name':'Organisation Count'})
.....: )
Out[110]:
Company Name Amount Organisation Count
0 Vifor Pharma UK Ltd 4207.93 5
或者如果您不想重置索引:
df.groupby('Company Name')['Amount'].agg(['sum','count'])
或
df.groupby('Company Name').agg({'Amount': ['sum','count']})
演示:
In [98]: df.groupby('Company Name')['Amount'].agg(['sum','count'])
Out[98]:
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5
In [99]: df.groupby('Company Name').agg({'Amount': ['sum','count']})
Out[99]:
Amount
sum count
Company Name
Vifor Pharma UK Ltd 4207.93 5
如果您有很多列并且只有一列不同,您可以这样做:
In[1]: grouper = df.groupby('Company Name')
In[2]: res = grouper.count()
In[3]: res['Amount'] = grouper.Amount.sum()
In[4]: res
Out[4]:
Organisation Name Amount
Company Name
Vifor Pharma UK Ltd 5 4207.93
请注意,您可以根据需要重命名组织名称列。
df.groupby('Company Name').agg({'Organisation name':'count','Amount':'sum'})\
.apply(lambda x: x.sort_values(['count','sum'], ascending=False))
以防万一您想知道如何在聚合期间重命名列,以下是
的方法pandas >= 0.25: Named Aggregation
df.groupby('Company Name')['Amount'].agg(MySum='sum', MyCount='count')
或者,
df.groupby('Company Name').agg(MySum=('Amount', 'sum'), MyCount=('Amount', 'count'))
MySum MyCount
Company Name
Vifor Pharma UK Ltd 4207.93 5