pandas帧累计和条件
pandas frame cumulative sum conditioned
有没有办法根据另一列的值对列中的值进行累加,但在不再遵守条件后,重置累加?在下图中,条件因子 'Rbin' 等于 1。
random_data_frame = pandas.DataFrame()
random_data_frame['Rbin'] = [0,1,1,1,0,0,1,0]
random_data_frame['Rmomentum'] = [-0.07,0.03,0.06,0.005,-0.008,-0.8,0.8,-0.5]
您可以创建一个临时索引来识别 Rbin
值的连续序列,然后在这些序列上使用 groupby
和 cumsum
并设置为 np.nan
值Rbin
为零的累计和。
random_data_frame['new_id'] = (random_data_frame.Rbin.diff() != 0).cumsum()
random_data_frame['cumulative_sum'] = random_data_frame.groupby('new_id')['Rmomentum'].cumsum().reset_index()['Rmomentum']
random_data_frame.loc[random_data_frame.Rbin == 0, 'cumulative_sum'] = np.nan
这是您示例的结果:
Rbin Rmomentum new_id cumulative_sum
0 0 -0.070 1 NaN
1 1 0.030 2 0.030
2 1 0.060 2 0.090
3 1 0.005 2 0.095
4 0 -0.008 3 NaN
5 0 -0.800 3 NaN
6 1 0.800 4 0.800
7 0 -0.500 5 NaN
压缩版为:
random_data_frame['cumulative_sum'] = np.where(
random_data_frame.Rbin == 0,
np.nan,
random_data_frame.groupby((random_data_frame.Rbin.diff() != 0).cumsum())['Rmomentum'].cumsum().reset_index()['Rmomentum']
)
有没有办法根据另一列的值对列中的值进行累加,但在不再遵守条件后,重置累加?在下图中,条件因子 'Rbin' 等于 1。
random_data_frame = pandas.DataFrame()
random_data_frame['Rbin'] = [0,1,1,1,0,0,1,0]
random_data_frame['Rmomentum'] = [-0.07,0.03,0.06,0.005,-0.008,-0.8,0.8,-0.5]
您可以创建一个临时索引来识别 Rbin
值的连续序列,然后在这些序列上使用 groupby
和 cumsum
并设置为 np.nan
值Rbin
为零的累计和。
random_data_frame['new_id'] = (random_data_frame.Rbin.diff() != 0).cumsum()
random_data_frame['cumulative_sum'] = random_data_frame.groupby('new_id')['Rmomentum'].cumsum().reset_index()['Rmomentum']
random_data_frame.loc[random_data_frame.Rbin == 0, 'cumulative_sum'] = np.nan
这是您示例的结果:
Rbin Rmomentum new_id cumulative_sum
0 0 -0.070 1 NaN
1 1 0.030 2 0.030
2 1 0.060 2 0.090
3 1 0.005 2 0.095
4 0 -0.008 3 NaN
5 0 -0.800 3 NaN
6 1 0.800 4 0.800
7 0 -0.500 5 NaN
压缩版为:
random_data_frame['cumulative_sum'] = np.where(
random_data_frame.Rbin == 0,
np.nan,
random_data_frame.groupby((random_data_frame.Rbin.diff() != 0).cumsum())['Rmomentum'].cumsum().reset_index()['Rmomentum']
)