pandas帧累计和条件

pandas frame cumulative sum conditioned

有没有办法根据另一列的值对列中的值进行累加,但在不再遵守条件后,重置累加?在下图中,条件因子 'Rbin' 等于 1。

random_data_frame = pandas.DataFrame()
random_data_frame['Rbin'] = [0,1,1,1,0,0,1,0]
random_data_frame['Rmomentum'] = [-0.07,0.03,0.06,0.005,-0.008,-0.8,0.8,-0.5]

您可以创建一个临时索引来识别 Rbin 值的连续序列,然后在这些序列上使用 groupbycumsum 并设置为 np.nanRbin 为零的累计和。

random_data_frame['new_id'] = (random_data_frame.Rbin.diff() != 0).cumsum()
random_data_frame['cumulative_sum'] = random_data_frame.groupby('new_id')['Rmomentum'].cumsum().reset_index()['Rmomentum']
random_data_frame.loc[random_data_frame.Rbin == 0, 'cumulative_sum'] = np.nan

这是您示例的结果:

   Rbin  Rmomentum  new_id  cumulative_sum
0     0     -0.070       1             NaN
1     1      0.030       2           0.030
2     1      0.060       2           0.090
3     1      0.005       2           0.095
4     0     -0.008       3             NaN
5     0     -0.800       3             NaN
6     1      0.800       4           0.800
7     0     -0.500       5             NaN

压缩版为:

random_data_frame['cumulative_sum'] = np.where(
    random_data_frame.Rbin == 0,
    np.nan,
    random_data_frame.groupby((random_data_frame.Rbin.diff() != 0).cumsum())['Rmomentum'].cumsum().reset_index()['Rmomentum']
)