如何按多个条件对pandas df进行分组,取均值追加到df?
How to group pandas df by multiple conditions, take the mean and append to df?
我有一个看起来像这样的 df:
df = pd.DataFrame({
'Time' : [1,2,7,10,15,16,77,98,999,1000,1121,1245,1373,1490,1555],
'Act_cat' : [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4],
'Count' : [6, 2, 4, 1, 2, 1, 8, 4, 3, 1, 4, 13, 3, 1, 2],
'Moving': [1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]})
我想按“Act_cat”中的相同值分组并按“移动”==1 对这些组取“计数”列的平均值并将其映射回df.
我已经尝试了下面的方法,但这里“计数”列的所有行都是平均的,而不仅仅是“移动”==1 的行。
group1 = (df['moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
mean_values = df.groupby(group1)["Count"].mean()
df['newcol'] = group1.map(mean_values)
请告诉我如何解决这个问题!
谢谢,
塔尼
IIUC 使用:
group1 = (df['Moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
mean_values = df[df['Moving'].eq(1)].groupby(group1)["Count"].mean()
df['newcol'] = group1.map(mean_values)
备选方案:
group1 = (df['Moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
df['newcol'] = df['Count'].where(df['Moving'].eq(1)).groupby(group1).transform('mean')
print (df)
Time Act_cat Count Moving newcol
0 1 1 6 1 4.6
1 2 1 2 0 4.6
2 7 1 4 1 4.6
3 10 1 1 0 4.6
4 15 1 2 1 4.6
5 16 2 1 0 4.6
6 77 2 8 1 4.6
7 98 2 4 0 4.6
8 999 2 3 1 4.6
9 1000 2 1 0 4.6
10 1121 4 4 1 3.0
11 1245 4 13 0 3.0
12 1373 4 3 1 3.0
13 1490 4 1 0 3.0
14 1555 4 2 1 3.0
我有一个看起来像这样的 df:
df = pd.DataFrame({
'Time' : [1,2,7,10,15,16,77,98,999,1000,1121,1245,1373,1490,1555],
'Act_cat' : [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 4, 4, 4, 4, 4],
'Count' : [6, 2, 4, 1, 2, 1, 8, 4, 3, 1, 4, 13, 3, 1, 2],
'Moving': [1,0,1,0,1,0,1,0,1,0,1,0,1,0,1]})
我想按“Act_cat”中的相同值分组并按“移动”==1 对这些组取“计数”列的平均值并将其映射回df.
我已经尝试了下面的方法,但这里“计数”列的所有行都是平均的,而不仅仅是“移动”==1 的行。
group1 = (df['moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
mean_values = df.groupby(group1)["Count"].mean()
df['newcol'] = group1.map(mean_values)
请告诉我如何解决这个问题!
谢谢, 塔尼
IIUC 使用:
group1 = (df['Moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
mean_values = df[df['Moving'].eq(1)].groupby(group1)["Count"].mean()
df['newcol'] = group1.map(mean_values)
备选方案:
group1 = (df['Moving'].eq(1) & df['Act_cat'].diff().abs() > 0).cumsum()
df['newcol'] = df['Count'].where(df['Moving'].eq(1)).groupby(group1).transform('mean')
print (df)
Time Act_cat Count Moving newcol
0 1 1 6 1 4.6
1 2 1 2 0 4.6
2 7 1 4 1 4.6
3 10 1 1 0 4.6
4 15 1 2 1 4.6
5 16 2 1 0 4.6
6 77 2 8 1 4.6
7 98 2 4 0 4.6
8 999 2 3 1 4.6
9 1000 2 1 0 4.6
10 1121 4 4 1 3.0
11 1245 4 13 0 3.0
12 1373 4 3 1 3.0
13 1490 4 1 0 3.0
14 1555 4 2 1 3.0