计算累计和,而另一列的值保持不变
Calculate cumulative sum while value of another column stays the same
对于后面的df
,我想计算Inst_Dist
列的累计和并保存为Cumu_Dist
,而WDir_Deg
的值保持不变。当WDir_Deg
中的值改变时,我需要重新开始累加。
因此,
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | NaN
1 | 285 | 17 | NaN
2 | 285 | 19 | NaN
3 | 287 | 19 | NaN
4 | 289 | 10 | NaN
变成
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | 20
1 | 285 | 17 | 17
2 | 285 | 19 | 36
3 | 287 | 19 | 19
4 | 289 | 10 | 10
我的非惯用(极慢)Python代码如下。如果有人可以指导我如何使代码更快、更地道,我将不胜感激。
prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
curr_angle = df.loc[curr_ind, 'WDir_Deg']
if prev_angle == curr_angle:
curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
else:
prev_angle = curr_angle
curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
有点棘手。引用此 question/answers Pandas groupby cumulative sum
我做了这个解决方案
df['Cumu_Dist'] = df.groupby('WDir_Deg').Inst_Dist.cumsum()
哪个returns
index WDir_Deg Inst_Dist Cumu_Dist
0 0 285 17 17
1 1 285 19 36
2 2 287 19 19
3 3 289 20 20
这使用 pandas
版本 0.23.4
使用助手 Series
与比较 WDir_Deg
列不等于 ne
, shift
and cumsum
for consecutive groups and pass it to DataFrameGroupBy.cumsum
:
s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
WDir_Deg Inst_Dist Cumu_Dist
0 289 20 20
1 285 17 17
2 285 19 36
3 287 19 19
4 289 10 10
详情:
print (s)
0 1
1 2
2 2
3 3
4 4
Name: WDir_Deg, dtype: int32
对于后面的df
,我想计算Inst_Dist
列的累计和并保存为Cumu_Dist
,而WDir_Deg
的值保持不变。当WDir_Deg
中的值改变时,我需要重新开始累加。
因此,
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | NaN
1 | 285 | 17 | NaN
2 | 285 | 19 | NaN
3 | 287 | 19 | NaN
4 | 289 | 10 | NaN
变成
index | WDir_Deg | Inst_Dist | Cumu_Dist
0 | 289 | 20 | 20
1 | 285 | 17 | 17
2 | 285 | 19 | 36
3 | 287 | 19 | 19
4 | 289 | 10 | 10
我的非惯用(极慢)Python代码如下。如果有人可以指导我如何使代码更快、更地道,我将不胜感激。
prev_angle = -1
curr_cumu_dist = 0
for curr_ind in df.index:
curr_angle = df.loc[curr_ind, 'WDir_Deg']
if prev_angle == curr_angle:
curr_cumu_dist += df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
else:
prev_angle = curr_angle
curr_cumu_dist = df.loc[curr_ind, 'Inst_Dist']
df.loc[curr_ind, 'Cumu_Dist'] = curr_cumu_dist
有点棘手。引用此 question/answers Pandas groupby cumulative sum
我做了这个解决方案
df['Cumu_Dist'] = df.groupby('WDir_Deg').Inst_Dist.cumsum()
哪个returns
index WDir_Deg Inst_Dist Cumu_Dist
0 0 285 17 17
1 1 285 19 36
2 2 287 19 19
3 3 289 20 20
这使用 pandas
版本 0.23.4
使用助手 Series
与比较 WDir_Deg
列不等于 ne
, shift
and cumsum
for consecutive groups and pass it to DataFrameGroupBy.cumsum
:
s = df['WDir_Deg'].ne(df['WDir_Deg'].shift()).cumsum()
df['Cumu_Dist'] = df.groupby(s)['Inst_Dist'].cumsum()
print (df)
WDir_Deg Inst_Dist Cumu_Dist
0 289 20 20
1 285 17 17
2 285 19 36
3 287 19 19
4 289 10 10
详情:
print (s)
0 1
1 2
2 2
3 3
4 4
Name: WDir_Deg, dtype: int32