Python Pandas 中的解析滑动 Windows 函数

Analytic Sliding Windows function in Python Pandas

有table:

list_1= [['2016-01-01',1,'King', 1000],    
        ['2016-01-02',1,'King', -200],    
        ['2016-01-03',1,'King', 100],    
        ['2016-01-04',1,'King',-400],    
        ['2016-01-05',1,'King', 200],    
        ['2016-01-06',1,'King',  -200],    
        ['2016-01-01',2,'Smith',  1000],    
        ['2016-01-02',2,'Smith',  -300],    
        ['2016-01-03',2,'Smith',  -600],    
        ['2016-01-04',2,'Smith',  100],    
        ['2016-01-05',2,'Smith',  -100]]
labels=['a_date','c_id','c_name','c_action']
df=pd.DataFrame(list_1,columns=labels)
df

输出:

    a_date       c_id   c_name  c_action
0   2016-01-01     1    King    1000
1   2016-01-02     1    King    -200
2   2016-01-03     1    King    100
3   2016-01-04     1    King    -400
4   2016-01-05     1    King    200
5   2016-01-06     1    King    -200
6   2016-01-01     2    Smith   1000
7   2016-01-02     2    Smith   -300
8   2016-01-03     2    Smith   -600
9   2016-01-04     2    Smith   100
10  2016-01-05     2    Smith   -100

需要得到table:

a_date      c_id    c_name  c_amount    Balance
2016-01-01     1    King    1000        1000
2016-01-02     1    King    -200        800
2016-01-03     1    King    100         900
2016-01-04     1    King    -400        500
2016-01-05     1    King    200         700
2016-01-06     1    King    -200        500
2016-01-01     2    Smith   1000        1000
2016-01-02     2    Smith   -300        700
2016-01-03     2    Smith   -600        100
2016-01-04     2    Smith   100         200
2016-01-05     2    Smith   -100        100

所以我需要在每个客户的每次操作后制作 "Balance" 列,其中包含累计金额。 这相当于 SQL 查询:

SELECT *,
        SUM(c_amount) OVER (PARTITION BY c_id ORDER BY a_date) AS 'Balance'
FROM account_actions

对于两个客户的解决方案都不难,可以将table除以c_id,总结并合并back.But应该是10000个客户的动态解决方案...

正如@Vaishali 评论的那样,这是 groupbycumsum。您可能想要执行 sort_values 以确保数据按顺序排序,尽管它看起来已经如此:

# sort by `c_id` and `a_date`
df = df.sort_values(['c_id','a_date'])

df['balance'] = df.groupby('c_id')['c_action'].cumsum()

输出:

        a_date  c_id c_name  c_action  balance
0   2016-01-01     1   King      1000     1000
1   2016-01-02     1   King      -200      800
2   2016-01-03     1   King       100      900
3   2016-01-04     1   King      -400      500
4   2016-01-05     1   King       200      700
5   2016-01-06     1   King      -200      500
6   2016-01-01     2  Smith      1000     1000
7   2016-01-02     2  Smith      -300      700
8   2016-01-03     2  Smith      -600      100
9   2016-01-04     2  Smith       100      200
10  2016-01-05     2  Smith      -100      100