pandas 用累计和替换列
pandas replace columns with cumulative sum
我有一个按 customer_id
和 month
分组的数据框,如下所示:
customer_id | month | total
1 | Jan | 20
| Feb | 10
2 | Jan | 20
3 | Feb | 30
| Mar | 10
| Apr | 5
我想使用 total
列来计算截至当前月份的所有前几个月的累计总和,如下所示:
customer_id | month | total | cumsum
1 | Jan | 20 | 20
| Feb | 10 | 30
2 | Jan | 20 | 20
3 | Feb | 30 | 30
| Mar | 10 | 40
| Apr | 5 | 45
我尝试了 df.grouby(['customer_id', 'month'])['total'].cumsum()
但没有成功,有人可以帮忙吗?
从您的普通数据框(不分组或弄乱索引),只需执行 df.groupby('customer_id').cumsum()
。
示例:
import io
z=io.StringIO("""customer_id month total
1 Jan 20
1 Feb 10
2 Jan 20
3 Feb 30
3 Mar 10
3 Apr 5""")
df = pd.read_table(z, delim_whitespace=True)
产量
customer_id month total
0 1 Jan 20
1 1 Feb 10
2 2 Jan 20
3 3 Feb 30
4 3 Mar 10
5 3 Apr 5
然后
df.groupby('customer_id').cumsum()
total
0 20
1 30
2 20
3 30
4 40
5 45
然后将其分配回去
df['cumsum'] = df.groupby('customer_id').cumsum()
customer_id month total cumsum
0 1 Jan 20 20
1 1 Feb 10 30
2 2 Jan 20 20
3 3 Feb 30 30
4 3 Mar 10 40
5 3 Apr 5 45
我有一个按 customer_id
和 month
分组的数据框,如下所示:
customer_id | month | total
1 | Jan | 20
| Feb | 10
2 | Jan | 20
3 | Feb | 30
| Mar | 10
| Apr | 5
我想使用 total
列来计算截至当前月份的所有前几个月的累计总和,如下所示:
customer_id | month | total | cumsum
1 | Jan | 20 | 20
| Feb | 10 | 30
2 | Jan | 20 | 20
3 | Feb | 30 | 30
| Mar | 10 | 40
| Apr | 5 | 45
我尝试了 df.grouby(['customer_id', 'month'])['total'].cumsum()
但没有成功,有人可以帮忙吗?
从您的普通数据框(不分组或弄乱索引),只需执行 df.groupby('customer_id').cumsum()
。
示例:
import io
z=io.StringIO("""customer_id month total
1 Jan 20
1 Feb 10
2 Jan 20
3 Feb 30
3 Mar 10
3 Apr 5""")
df = pd.read_table(z, delim_whitespace=True)
产量
customer_id month total
0 1 Jan 20
1 1 Feb 10
2 2 Jan 20
3 3 Feb 30
4 3 Mar 10
5 3 Apr 5
然后
df.groupby('customer_id').cumsum()
total
0 20
1 30
2 20
3 30
4 40
5 45
然后将其分配回去
df['cumsum'] = df.groupby('customer_id').cumsum()
customer_id month total cumsum
0 1 Jan 20 20
1 1 Feb 10 30
2 2 Jan 20 20
3 3 Feb 30 30
4 3 Mar 10 40
5 3 Apr 5 45