Pandas:根据组进行不同的汇总
Pandas: Aggregate differently based on group
假设我有一些数据如下:
patient_id lab_type value
1 food 10
1 food 8
2 food 3
2 food 5
1 shot 4
1 shot 10
2 shot 2
2 shot 4
然后我会把groupby(['patient_id', 'lab_type'])
这样的东西分组
之后,我想在 value
上进行汇总,但每个 lab_type
都不同。在 food
上,我想使用 mean
进行聚合,在 shot
上,我想使用 sum
.
进行聚合
最终数据应该是这样的:
patient_id lab_type value
1 food 9 (10 + 8 / 2)
2 food 4 (3 + 5 / 2)
1 shot 14 (10 + 4)
2 shot 6 (2 + 4)
this post 中的答案看起来很有希望。从这里开始,我想出了下面的代码,应该适合你。
测试数据:
data = [{"A" : 1, "B" : "food", "C" : 10},
{"A" : 1, "B" : "food", "C" : 8},
{"A" : 2, "B" : "food", "C" : 3},
{"A" : 2, "B" : "food", "C" : 5},
{"A" : 1, "B" : "shot", "C" : 4},
{"A" : 1, "B" : "shot", "C" : 10},
{"A" : 2, "B" : "shot", "C" : 2},
{"A" : 2, "B" : "shot", "C" : 4}]
df = pd.DataFrame(data)
实际代码:
res = df.groupby(['A', 'B']).apply(
lambda x: pd.Series(
{"value" : x.C.mean() if x.iloc[0].B == "food" else x.C.sum()}
)
)
这导致
value
A B
1 food 9
shot 14
2 food 4
shot 6
让 P
成为您的 DataFrame。
P[P.lab_type =="food"].groupby(['patient_id']).aggregate(np.avg)
shot
组和 concatenate 结果类似。
On food I'd like to aggregate using mean and on shot I'd like to aggregate using sum.
只需使用 .apply
并传递一个自定义函数:
def calc(g):
if g.iloc[0].lab_type == 'shot':
return sum(g.value)
else:
return np.mean(g.value)
result = df.groupby(['patient_id', 'lab_type']).apply(calc)
此处 calc
接收 per-group 数据帧,如 Panda's split-apply-combine 所示。结果你得到了你想要的:
patient_id lab_type
1 food 9
shot 14
2 food 4
shot 6
dtype: float64
我尝试修改答案:
您可以使用 mean
and sum
and then concat
with reset_index
:
print df
patient_id lab_type value
0 1 food 10
1 1 food 8
2 2 food 3
3 2 food 5
4 1 shot 4
5 1 shot 10
6 2 shot 2
7 2 shot 4
df1 = df[df.lab_type =="food"].groupby(['patient_id']).mean()
df1['lab_type'] = 'food'
print df1
value lab_type
patient_id
1 9 food
2 4 food
df2 = df[df.lab_type =="shot"].groupby(['patient_id']).sum()
df2['lab_type'] = 'shot'
print df2
value lab_type
patient_id
1 14 shot
2 6 shot
print pd.concat([df1, df2]).reset_index()
patient_id value lab_type
0 1 9 food
1 2 4 food
2 1 14 shot
3 2 6 shot
假设我有一些数据如下:
patient_id lab_type value
1 food 10
1 food 8
2 food 3
2 food 5
1 shot 4
1 shot 10
2 shot 2
2 shot 4
然后我会把groupby(['patient_id', 'lab_type'])
之后,我想在 value
上进行汇总,但每个 lab_type
都不同。在 food
上,我想使用 mean
进行聚合,在 shot
上,我想使用 sum
.
最终数据应该是这样的:
patient_id lab_type value
1 food 9 (10 + 8 / 2)
2 food 4 (3 + 5 / 2)
1 shot 14 (10 + 4)
2 shot 6 (2 + 4)
this post 中的答案看起来很有希望。从这里开始,我想出了下面的代码,应该适合你。
测试数据:
data = [{"A" : 1, "B" : "food", "C" : 10},
{"A" : 1, "B" : "food", "C" : 8},
{"A" : 2, "B" : "food", "C" : 3},
{"A" : 2, "B" : "food", "C" : 5},
{"A" : 1, "B" : "shot", "C" : 4},
{"A" : 1, "B" : "shot", "C" : 10},
{"A" : 2, "B" : "shot", "C" : 2},
{"A" : 2, "B" : "shot", "C" : 4}]
df = pd.DataFrame(data)
实际代码:
res = df.groupby(['A', 'B']).apply(
lambda x: pd.Series(
{"value" : x.C.mean() if x.iloc[0].B == "food" else x.C.sum()}
)
)
这导致
value
A B
1 food 9
shot 14
2 food 4
shot 6
让 P
成为您的 DataFrame。
P[P.lab_type =="food"].groupby(['patient_id']).aggregate(np.avg)
shot
组和 concatenate 结果类似。
On food I'd like to aggregate using mean and on shot I'd like to aggregate using sum.
只需使用 .apply
并传递一个自定义函数:
def calc(g):
if g.iloc[0].lab_type == 'shot':
return sum(g.value)
else:
return np.mean(g.value)
result = df.groupby(['patient_id', 'lab_type']).apply(calc)
此处 calc
接收 per-group 数据帧,如 Panda's split-apply-combine 所示。结果你得到了你想要的:
patient_id lab_type
1 food 9
shot 14
2 food 4
shot 6
dtype: float64
我尝试修改
您可以使用 mean
and sum
and then concat
with reset_index
:
print df
patient_id lab_type value
0 1 food 10
1 1 food 8
2 2 food 3
3 2 food 5
4 1 shot 4
5 1 shot 10
6 2 shot 2
7 2 shot 4
df1 = df[df.lab_type =="food"].groupby(['patient_id']).mean()
df1['lab_type'] = 'food'
print df1
value lab_type
patient_id
1 9 food
2 4 food
df2 = df[df.lab_type =="shot"].groupby(['patient_id']).sum()
df2['lab_type'] = 'shot'
print df2
value lab_type
patient_id
1 14 shot
2 6 shot
print pd.concat([df1, df2]).reset_index()
patient_id value lab_type
0 1 9 food
1 2 4 food
2 1 14 shot
3 2 6 shot