Pandas groupby 并跨多列计数

Question

我有按 ID、年份排序的数据，然后是一系列事件标志，指示该 ID 在那一年是否发生了某件事：

ID	年份	x	y	z
1	2015	0	1	0
1	2016	1	1	0
1	2017	0	1	1
2	2015	1	0	1
2	2016	1	1	0
2	2017	0	1	1

我想按 ID 和 Year 分组，并对每个“事件”列应用累积计数，这样我就剩下类似下面的内容

ID	年份	x_total	y_total	z_total
1	2015	0	1	0
1	2016	1	2	0
1	2017	1	3	1
2	2015	1	0	1
2	2016	2	1	1
2	2017	2	2	2

我已经使用 cumsum 和 cumcount 查看了各种选项，但我似乎无法弄清楚。

Answer 1

您可以使用 .groupby() + .cumsum() to get the cumulative count to each "event" column. Then add _total as suffix to the column names by .add_suffix() 然后加入前两列：

df[['ID', 'Year']].join(df.groupby('ID')[['x', 'y', 'z']].cumsum().add_suffix('_total'))

结果：

   ID  Year  x_total  y_total  z_total
0   1  2015        0        1        0
1   1  2016        1        2        0
2   1  2017        1        3        1
3   2  2015        1        0        1
4   2  2016        2        1        1
5   2  2017        2        2        2

Pandas groupby 并跨多列计数

Pandas groupby and count across multiple columns

python

counter

dataframe

pandas