Pandas 累计总和取决于其他列值

Pandas cumulative sum depending on other columns value

我有一个这样的数据集

Date        Runner  Group   distance [km]
2021-01-01  Joe     1       7            
2021-01-02  Jack    1       6            
2021-01-03  Jess    1       9            
2021-01-01  Paul    2       11           
2021-01-02  Peter   2       12           
2021-01-02  Sara    3       15           
2021-01-03  Sarah   3       10           
 

我想计算每组跑步者的累计总和。

Date        Runner  Group   distance [km]   cum sum [km]
2021-01-01  Joe     1       7               7
2021-01-02  Jack    1       6               13
2021-01-03  Jess    1       9               22
2021-01-01  Paul    2       11              11
2021-01-02  Peter   2       12              23
2021-01-02  Sara    3       15              15
2021-01-03  Sarah   3       10              25  

不幸的是,我不知道该怎么做,也没有在其他地方找到答案。有人可以给我提示吗?

import pandas as pd
import numpy as np

df = pd.DataFrame([['2021-01-01','Joe', 1, 7],
                   ['2021-01-02',"Jack", 1, 6],
                   ['2021-01-03',"Jess", 1, 9],
                   ['2021-01-01',"Paul", 2, 11],
                   ['2021-01-02',"Peter", 2, 12],
                   ['2021-01-02',"Sara", 3, 15],
                   ['2021-01-03',"Sarah", 3, 10]],
                  columns=['Date','Runner', 'Group', 'distance [km]'])

尝试 groupby cumsum:

>>> df['cum sum [km]'] = df.groupby('Group')['distance [km]'].cumsum()
>>> df
         Date Runner  Group  distance [km]  cum sum [km]
0  2021-01-01    Joe      1              7             7
1  2021-01-02   Jack      1              6            13
2  2021-01-03   Jess      1              9            22
3  2021-01-01   Paul      2             11            11
4  2021-01-02  Peter      2             12            23
5  2021-01-02   Sara      3             15            15
6  2021-01-03  Sarah      3             10            25
>>>