按组平均,排除一些行

Mean by group, exclude some rows

我有一个 table 并且想按州计算平均值,我想只计算客户为 1 的行的平均值

Customer State Score Output_Mean
0 GA 1
1 GA 2 2.5
1 GA 3 2.5
1 NY 9 8
1 NY 7 8
0 DC 6
0 DC 4

我有以下代码,如何添加客户标准?

df['output_mean'] = (df.fillna({'state':'missing'}).groupby(['state'])['score'].transform(lambda x: x.mean()))

你可以只更新你想要的地方:

customer_1 = df['Customer'].eq(1)
df.loc[customer_1, 'Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')

对于大多数(后来的)Pandas 版本,您可以放弃左侧的 customer_1 并执行:

df['Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')

或者只是 query 没有面具:

df['Output_Mean'] = df.query('Customer == 1').groupby('State')['Score'].transform('mean')

输出:

   Customer State  Score  Output_Mean
0         0   GA       1          NaN
1         1   GA       2          2.5
2         1   GA       3          2.5
3         1   NY       9          8.0
4         1   NY       7          8.0
5         0   DC       6          NaN
6         0   DC       4          NaN

您可以在 groupby 中包含 customer==1 的掩码(连同 State)并使用 np.wheredf['output_mean'] 赋值:

mask = df['Customer']==1
df['output_mean'] = np.where(mask, df.fillna({'State':'missing'}).groupby([mask,'State'])['Score'].transform('mean'), np.nan)

输出:

   Customer State  Score  output_mean
0         0    GA      1          NaN
1         1    GA      2          2.5
2         1    GA      3          2.5
3         1    NY      9          8.0
4         1    NY      7          8.0
5         0    DC      6          NaN
6         0    DC      4          NaN