pandas 在列中填充前几行的累积总和(在每个 nan 后重置)

pandas fillna in column with cumsum of previous rows (reset after every nan)

我找到了一个按行解决这个问题的解决方案,但是有没有一种快速的方法可以按列解决这个问题?

这是数据帧的快速示例:

import pandas as pd
import numpy as np

df = pd.DataFrame([['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['SF',15.0],
['GB',5.7],
[1.0,np.nan],
['GB',43.76],
['TEN',17.3],
['ARI',0.2],
['ATL',12.3],
['HOU',21.1],
['ARI',1.7],
['ATL',12.6],
['BUF',7.0],
['GB',5.7],
[2.0,np.nan]], columns = ['team','points'])

我一直在试图操纵 df['sum'] = df['points'].cumsum()。显然它计算了累积总和,但我需要它做的是重新启动 when/if 得到一个 nan 而不是跳过它。

GroupBy.cumsum 与通过另一个 cumsum 检查缺失值创建的助手系列一起使用 cumsum:

df['sum'] = df.groupby(df['points'].isna().cumsum())['points'].cumsum()
print (df)
   team  points     sum
0    GB   43.76   43.76
1   TEN   17.30   61.06
2   ARI    0.20   61.26
3   ATL   12.30   73.56
4   HOU   21.10   94.66
5   ARI    1.70   96.36
6   ATL   12.60  108.96
7    SF   15.00  123.96
8    GB    5.70  129.66
9     1     NaN     NaN
10   GB   43.76   43.76
11  TEN   17.30   61.06
12  ARI    0.20   61.26
13  ATL   12.30   73.56
14  HOU   21.10   94.66
15  ARI    1.70   96.36
16  ATL   12.60  108.96
17  BUF    7.00  115.96
18   GB    5.70  121.66
19    2     NaN     NaN

不确定这是否与 jezrael 的解决方案相同,但我建议创建一个代表求和组的列,如在 中,您在其中检查 np.nan 而不是 0。然后对这些求和组进行累加。

另一种不使用 groupby 并假设 所有点都是正数的方法 ,你可以用 cumsum 点和 ffill 具有先前值的 nan,然后删除指向 isna 的值的 cummax,例如:

df['s'] = df['points'].cumsum().ffill()
df['s'] -= (df['s']*df['points'].isna()).cummax()
print (df)
   team  points       s
0    GB   43.76   43.76
1   TEN   17.30   61.06
2   ARI    0.20   61.26
3   ATL   12.30   73.56
4   HOU   21.10   94.66
5   ARI    1.70   96.36
6   ATL   12.60  108.96
7    SF   15.00  123.96
8    GB    5.70  129.66
9     1     NaN    0.00
10   GB   43.76   43.76
11  TEN   17.30   61.06
12  ARI    0.20   61.26
13  ATL   12.30   73.56
14  HOU   21.10   94.66
15  ARI    1.70   96.36
16  ATL   12.60  108.96
17  BUF    7.00  115.96
18   GB    5.70  121.66
19    2     NaN    0.00