设置为零并在值低于零时重新启动 cumsum 的 groupby cumsum
A groupby cumsum that is set to zero and restarts the cumsum when the value is below zero
我正在尝试创建一个累积和的列,当累积和低于零时,该列会重置为零。我的数据如下:
id
treatment
value
1
drugs
66
1
drugs
33
1
drugs
-100
1
drugs
11
1
drugs
30
1
drugs
-50
想要的结果:
id
treatment
days
cumsum
1
drugs
66
66
1
drugs
33
99
1
drugs
-100
0
1
drugs
11
11
1
drugs
30
41
1
drugs
-50
0
是否有接近此尝试的解决方案?
df.groupby(['id','treatment']).days.apply(lambda x: 0 if x.cumsum() < 0 else x.cumsum())
在 的基础上,您可以这样做:
df['cumsum'] = df.groupby(df['value'].lt(0).astype(int).diff().ne(0).cumsum())['value'].cumsum().clip(lower=0)
输出:
>>> df
id treatment value cumsum
0 1 drugs 66 66
1 1 drugs 33 99
2 1 drugs -100 0
3 1 drugs 11 11
4 1 drugs 30 41
5 1 drugs -50 0
这次我会推荐使用这个numba
功能:
from numba import njit
@njit
def cumli(x, lim):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total < lim:
total = 0
result.append(total)
return result
df['cumsum'] = df.groupby(['id','treatment']).days.transform(lambda x: cumli(x.values,0))
我正在尝试创建一个累积和的列,当累积和低于零时,该列会重置为零。我的数据如下:
id | treatment | value |
---|---|---|
1 | drugs | 66 |
1 | drugs | 33 |
1 | drugs | -100 |
1 | drugs | 11 |
1 | drugs | 30 |
1 | drugs | -50 |
想要的结果:
id | treatment | days | cumsum |
---|---|---|---|
1 | drugs | 66 | 66 |
1 | drugs | 33 | 99 |
1 | drugs | -100 | 0 |
1 | drugs | 11 | 11 |
1 | drugs | 30 | 41 |
1 | drugs | -50 | 0 |
是否有接近此尝试的解决方案?
df.groupby(['id','treatment']).days.apply(lambda x: 0 if x.cumsum() < 0 else x.cumsum())
在
df['cumsum'] = df.groupby(df['value'].lt(0).astype(int).diff().ne(0).cumsum())['value'].cumsum().clip(lower=0)
输出:
>>> df
id treatment value cumsum
0 1 drugs 66 66
1 1 drugs 33 99
2 1 drugs -100 0
3 1 drugs 11 11
4 1 drugs 30 41
5 1 drugs -50 0
这次我会推荐使用这个numba
功能:
from numba import njit
@njit
def cumli(x, lim):
total = 0
result = []
for i, y in enumerate(x):
total += y
if total < lim:
total = 0
result.append(total)
return result
df['cumsum'] = df.groupby(['id','treatment']).days.transform(lambda x: cumli(x.values,0))