基于多条件的分段总计
segmentation total based on multiple condition
数据框:-
ID spend month_diff
12 10 -1
12 10 -2
12 20 1
12 30 2
13 15 -1
13 20 -2
13 25 1
13 30 2
我想根据特定 ID 的月份差异获取 spend_total。 month_diff 负数表示客户去年的支出,正数表示 year.so,我想比较客户去年和今年的支出。所以条件如下:
条件:-
if month_diff >= -2 and < 0 then cumulative spend for negative months - flag=pre
if month_diff > 0 and <=2 then cumulative spend for positive months - flag=post
需要的数据框:-
ID spend month_diff tot_spend flag
12 10 -2 20 pre
12 30 2 50 post
13 20 -2 35 pre
13 30 2 55 post
使用numpy.sign
with Series.shift
, Series.ne
and Series.cumsum
for consecutive groups and pass to DataFrame.groupby
with aggregate GroupBy.last
和sum
。
上次使用 numpy.select
:
a = np.sign(df['month_diff'])
g = a.ne(a.shift()).cumsum()
df1 = (df.groupby(['ID', g])
.agg({'month_diff':'last', 'spend':'sum'})
.reset_index(level=1, drop=True)
.reset_index())
df1['flag'] = np.select([df1['month_diff'].ge(-2) & df1['month_diff'].lt(0),
df1['month_diff'].gt(0) & df1['month_diff'].le(2)],
['pre','post'], default='another val')
print (df1)
ID month_diff spend flag
0 12 -2 20 pre
1 12 2 50 post
2 13 -2 35 pre
3 13 2 55 post
数据框:-
ID spend month_diff
12 10 -1
12 10 -2
12 20 1
12 30 2
13 15 -1
13 20 -2
13 25 1
13 30 2
我想根据特定 ID 的月份差异获取 spend_total。 month_diff 负数表示客户去年的支出,正数表示 year.so,我想比较客户去年和今年的支出。所以条件如下:
条件:-
if month_diff >= -2 and < 0 then cumulative spend for negative months - flag=pre
if month_diff > 0 and <=2 then cumulative spend for positive months - flag=post
需要的数据框:-
ID spend month_diff tot_spend flag
12 10 -2 20 pre
12 30 2 50 post
13 20 -2 35 pre
13 30 2 55 post
使用numpy.sign
with Series.shift
, Series.ne
and Series.cumsum
for consecutive groups and pass to DataFrame.groupby
with aggregate GroupBy.last
和sum
。
上次使用 numpy.select
:
a = np.sign(df['month_diff'])
g = a.ne(a.shift()).cumsum()
df1 = (df.groupby(['ID', g])
.agg({'month_diff':'last', 'spend':'sum'})
.reset_index(level=1, drop=True)
.reset_index())
df1['flag'] = np.select([df1['month_diff'].ge(-2) & df1['month_diff'].lt(0),
df1['month_diff'].gt(0) & df1['month_diff'].le(2)],
['pre','post'], default='another val')
print (df1)
ID month_diff spend flag
0 12 -2 20 pre
1 12 2 50 post
2 13 -2 35 pre
3 13 2 55 post