按组平均,排除一些行
Mean by group, exclude some rows
我有一个 table 并且想按州计算平均值,我想只计算客户为 1 的行的平均值
Customer
State
Score
Output_Mean
0
GA
1
1
GA
2
2.5
1
GA
3
2.5
1
NY
9
8
1
NY
7
8
0
DC
6
0
DC
4
我有以下代码,如何添加客户标准?
df['output_mean'] = (df.fillna({'state':'missing'}).groupby(['state'])['score'].transform(lambda x: x.mean()))
你可以只更新你想要的地方:
customer_1 = df['Customer'].eq(1)
df.loc[customer_1, 'Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
对于大多数(后来的)Pandas 版本,您可以放弃左侧的 customer_1
并执行:
df['Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
或者只是 query
没有面具:
df['Output_Mean'] = df.query('Customer == 1').groupby('State')['Score'].transform('mean')
输出:
Customer State Score Output_Mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN
您可以在 groupby
中包含 customer==1
的掩码(连同 State
)并使用 np.where
为 df['output_mean']
赋值:
mask = df['Customer']==1
df['output_mean'] = np.where(mask, df.fillna({'State':'missing'}).groupby([mask,'State'])['Score'].transform('mean'), np.nan)
输出:
Customer State Score output_mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN
我有一个 table 并且想按州计算平均值,我想只计算客户为 1 的行的平均值
Customer | State | Score | Output_Mean |
---|---|---|---|
0 | GA | 1 | |
1 | GA | 2 | 2.5 |
1 | GA | 3 | 2.5 |
1 | NY | 9 | 8 |
1 | NY | 7 | 8 |
0 | DC | 6 | |
0 | DC | 4 |
我有以下代码,如何添加客户标准?
df['output_mean'] = (df.fillna({'state':'missing'}).groupby(['state'])['score'].transform(lambda x: x.mean()))
你可以只更新你想要的地方:
customer_1 = df['Customer'].eq(1)
df.loc[customer_1, 'Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
对于大多数(后来的)Pandas 版本,您可以放弃左侧的 customer_1
并执行:
df['Output_Mean'] = df[customer_1].groupby('State')['Score'].transform('mean')
或者只是 query
没有面具:
df['Output_Mean'] = df.query('Customer == 1').groupby('State')['Score'].transform('mean')
输出:
Customer State Score Output_Mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN
您可以在 groupby
中包含 customer==1
的掩码(连同 State
)并使用 np.where
为 df['output_mean']
赋值:
mask = df['Customer']==1
df['output_mean'] = np.where(mask, df.fillna({'State':'missing'}).groupby([mask,'State'])['Score'].transform('mean'), np.nan)
输出:
Customer State Score output_mean
0 0 GA 1 NaN
1 1 GA 2 2.5
2 1 GA 3 2.5
3 1 NY 9 8.0
4 1 NY 7 8.0
5 0 DC 6 NaN
6 0 DC 4 NaN