
segmentation total based on multiple condition


ID spend month_diff    
12  10    -1         
12  10    -2         
12  20     1        
12  30     2         
13  15    -1         
13  20    -2        
13  25     1        
13  30     2        

我想根据特定 ID 的月份差异获取 spend_total。 month_diff 负数表示客户去年的支出,正数表示 year.so,我想比较客户去年和今年的支出。所以条件如下:


if month_diff >= -2 and < 0 then cumulative spend for negative months - flag=pre
if month_diff > 0 and <=2 then  cumulative spend for positive months  - flag=post


ID spend month_diff tot_spend   flag    
12  10    -2         20         pre
12  30     2         50         post
13  20    -2         35         pre
13  30     2         55         post

使用numpy.sign with Series.shift , Series.ne and Series.cumsum for consecutive groups and pass to DataFrame.groupby with aggregate GroupBy.lastsum

上次使用 numpy.select:

a = np.sign(df['month_diff'])
g = a.ne(a.shift()).cumsum()
df1 = (df.groupby(['ID', g])
         .agg({'month_diff':'last', 'spend':'sum'})
         .reset_index(level=1, drop=True)
df1['flag'] = np.select([df1['month_diff'].ge(-2) & df1['month_diff'].lt(0),
                         df1['month_diff'].gt(0) & df1['month_diff'].le(2)], 
                         ['pre','post'], default='another val')
print (df1)
   ID  month_diff  spend  flag
0  12          -2     20   pre
1  12           2     50  post
2  13          -2     35   pre
3  13           2     55  post