pandas 用累计和替换列

Question

我有一个按 customer_id 和 month 分组的数据框，如下所示：

customer_id | month | total
1           | Jan   |  20
            | Feb   |  10
2           | Jan   |  20
3           | Feb   |  30
            | Mar   |  10
            | Apr   |  5

我想使用 total 列来计算截至当前月份的所有前几个月的累计总和，如下所示：

customer_id | month | total | cumsum
1           | Jan   |  20   | 20
            | Feb   |  10   | 30
2           | Jan   |  20   | 20
3           | Feb   |  30   | 30
            | Mar   |  10   | 40
            | Apr   |  5    | 45

我尝试了 df.grouby(['customer_id', 'month'])['total'].cumsum() 但没有成功，有人可以帮忙吗？

Answer 1

从您的普通数据框（不分组或弄乱索引），只需执行 df.groupby('customer_id').cumsum()。

示例：

import io
z=io.StringIO("""customer_id  month  total
1            Jan     20
1             Feb     10
2            Jan     20
3            Feb     30
3             Mar     10
3             Apr     5""")

df = pd.read_table(z, delim_whitespace=True)

产量

    customer_id  month      total
0   1            Jan        20
1   1            Feb        10
2   2            Jan        20
3   3            Feb        30
4   3            Mar        10
5   3            Apr        5

然后

df.groupby('customer_id').cumsum()


    total
0   20
1   30
2   20
3   30
4   40
5   45

然后将其分配回去

df['cumsum'] = df.groupby('customer_id').cumsum()   

    customer_id month       total   cumsum
0   1           Jan         20      20
1   1           Feb         10      30
2   2           Jan         20      20
3   3           Feb         30      30
4   3           Mar         10      40
5   3           Apr         5       45

pandas 用累计和替换列

pandas replace columns with cumulative sum

group-by

dataframe

cumulative-sum

python-3.x

pandas