如何计算另一组的平均值?
How to calculate mean in group by another group?
我有一个数据框:
date id type revenue
0 2021-09-01 Zw b1 20.045350
1 2021-09-01 Aw c 8.990000
2 2021-09-01 Zc c 14.990000
3 2021-09-01 ww b 25.944510
4 2021-09-01 jw c 3.881649
5 2021-09-01 pw b 9.990000
6 2021-09-01 fg c 2.990000
7 2021-09-01 kl b 4.990000
8 2021-09-02 mm b 7.990000
我想计算每种类型的平均收入,但不是按类型分组,而是按日期分组。因此,例如,平均类型“b1”必须不是 20.045350(因为只有一种 b1 类型),而是 20.045350/8 = 2.5(因为日期列中有 8 个 2021-09-01 值)。所以期望的结果必须是:
date type revenue
0 2021-09-01 b1 2.5
0 2021-09-01 c 3.85
0 2021-09-01 b 5.11
0 2021-09-02 b 7.990000
怎么做? groupby("date", "type").mean() 带来错误的结果:
date type revenue
0 2021-09-01 b1 20.045
0 2021-09-01 c 7.71
0 2021-09-01 b 13.64
0 2021-09-02 b 7.990000
df1 = df.groupby('date')['id'].count().reset_index().\
rename({'id':'count'}, axis = 1).merge(df)
df2 = df1.assign(revenue = df1.revenue/df1['count']).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
df2
date type revenue
0 2021-09-01 b 5.115564
1 2021-09-01 b1 2.505669
2 2021-09-01 c 3.856456
3 2021-09-02 b 7.990000
一种奇特的方法是:
df.groupby('date')['id'].count().reset_index().rename({'id':'count'}, axis = 1).merge(df).\
pipe(lambda x: x.assign(revenue = x.revenue/x['count'])).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
做一个双 groupby 并划分它们:
(df.groupby(['type', 'date'])
.revenue
.sum()
.div(df.date.value_counts(), level='date')
)
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64
解释:
- 获取日期的计数:
counts = df.date.value_counts()
- 根据
type
和 date
: 得到收入总和
revenue_sum = df.groupby(['type', 'date']).revenue.sum()
将 revenue_sum
除以 counts
,使用 date
水平:
revenue_sum.div(counts, level='date')
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64
我有一个数据框:
date id type revenue
0 2021-09-01 Zw b1 20.045350
1 2021-09-01 Aw c 8.990000
2 2021-09-01 Zc c 14.990000
3 2021-09-01 ww b 25.944510
4 2021-09-01 jw c 3.881649
5 2021-09-01 pw b 9.990000
6 2021-09-01 fg c 2.990000
7 2021-09-01 kl b 4.990000
8 2021-09-02 mm b 7.990000
我想计算每种类型的平均收入,但不是按类型分组,而是按日期分组。因此,例如,平均类型“b1”必须不是 20.045350(因为只有一种 b1 类型),而是 20.045350/8 = 2.5(因为日期列中有 8 个 2021-09-01 值)。所以期望的结果必须是:
date type revenue
0 2021-09-01 b1 2.5
0 2021-09-01 c 3.85
0 2021-09-01 b 5.11
0 2021-09-02 b 7.990000
怎么做? groupby("date", "type").mean() 带来错误的结果:
date type revenue
0 2021-09-01 b1 20.045
0 2021-09-01 c 7.71
0 2021-09-01 b 13.64
0 2021-09-02 b 7.990000
df1 = df.groupby('date')['id'].count().reset_index().\
rename({'id':'count'}, axis = 1).merge(df)
df2 = df1.assign(revenue = df1.revenue/df1['count']).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
df2
date type revenue
0 2021-09-01 b 5.115564
1 2021-09-01 b1 2.505669
2 2021-09-01 c 3.856456
3 2021-09-02 b 7.990000
一种奇特的方法是:
df.groupby('date')['id'].count().reset_index().rename({'id':'count'}, axis = 1).merge(df).\
pipe(lambda x: x.assign(revenue = x.revenue/x['count'])).groupby(['date','type']).\
agg({'revenue':sum}).reset_index()
做一个双 groupby 并划分它们:
(df.groupby(['type', 'date'])
.revenue
.sum()
.div(df.date.value_counts(), level='date')
)
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64
解释:
- 获取日期的计数:
counts = df.date.value_counts()
- 根据
type
和date
: 得到收入总和
revenue_sum = df.groupby(['type', 'date']).revenue.sum()
将 revenue_sum
除以 counts
,使用 date
水平:
revenue_sum.div(counts, level='date')
type date
b 2021-09-01 5.115564
2021-09-02 7.990000
b1 2021-09-01 2.505669
c 2021-09-01 3.856456
dtype: float64