Python Pandas 中 DataFrame 的复杂聚合?
Complicated aggregation of DataFrame in Python Pandas?
我有如下所示的 DataFrame:
df = pd.DataFrame({"VALUE" : [100, 200, 100, 300, 500],
"PRODUCT_ID" : [599, 200, 599, 599, 200],
"STATUS" : ["active", "active", "not_active", "unknown", "active"],
"CLIENT" : ["1", "1", "2", "2", "1"]})
我需要计算平均值、中值和最大值VALUE
每个 PRODUCT_ID
每个 CLIENT
都有“活动” STATUS
。我需要 df 这样的东西:
AVG = 266,6
因为:(500+200+100) : 3
MED = 200
?
MAX = 500
因为 500 是客户端 1
的最大活动聚合值
尝试:
(df.query('STATUS=="active"')
.groupby(['CLIENT'])['VALUE']
.agg(['mean','median','max'])
.reindex(df.CLIENT.unique())
)
输出:
mean median max
CLIENT
1 266.666667 200.0 500.0
2 NaN NaN NaN
你能试试这个吗:
df[df['STATUS'] == 'active'].groupby(['PRODUCT_ID', 'CLIENT']).agg(['mean','median','max'])
输出:
VALUE
mean median max
PRODUCT_ID CLIENT
200 1 350 350 500
599 1 100 100 100
我有如下所示的 DataFrame:
df = pd.DataFrame({"VALUE" : [100, 200, 100, 300, 500],
"PRODUCT_ID" : [599, 200, 599, 599, 200],
"STATUS" : ["active", "active", "not_active", "unknown", "active"],
"CLIENT" : ["1", "1", "2", "2", "1"]})
我需要计算平均值、中值和最大值VALUE
每个 PRODUCT_ID
每个 CLIENT
都有“活动” STATUS
。我需要 df 这样的东西:
AVG = 266,6
因为:(500+200+100) : 3
MED = 200
?
MAX = 500
因为 500 是客户端 1
尝试:
(df.query('STATUS=="active"')
.groupby(['CLIENT'])['VALUE']
.agg(['mean','median','max'])
.reindex(df.CLIENT.unique())
)
输出:
mean median max
CLIENT
1 266.666667 200.0 500.0
2 NaN NaN NaN
你能试试这个吗:
df[df['STATUS'] == 'active'].groupby(['PRODUCT_ID', 'CLIENT']).agg(['mean','median','max'])
输出:
VALUE
mean median max
PRODUCT_ID CLIENT
200 1 350 350 500
599 1 100 100 100