如何根据组列对序列进行分组分配groupid
How to group sequence based on group column assign a groupid
下面是我的数据框
ColA ColB Time ColC
A B 01-01-2022 ABC
A B 02-01-2022 ABC
A B 07-01-2022 XYZ
A B 11-01-2022 IJK
A B 14-01-2022 ABC
想要的结果:
ColA ColB Time ColC groupID
A B 01-01-2022 ABC 1
A B 02-01-2022 ABC 1
A B 07-01-2022 XYZ 2
A B 11-01-2022 IJK 3
A B 14-01-2022 ABC 4
更新:
下面是在 cumsum
之后执行的代码
df['ColC'] = df['ColC'].ne(df['ColC'].shift(1)).groupby([df['ColA'],
df['ColB']]).cumsum()
ColA ColB Time ColC groupID
A B 01-01-2022 ABC 1
A B 02-01-2022 ABC 1
A B 07-01-2022 XYZ 2
A B 11-01-2022 XYZ 3
A B 14-01-2022 XYZ 4
A B 14-01-2022 XYZ 4
提前致谢
逻辑并不完全清楚,但看起来您正在尝试按周数(和 ColC)分组:
df['groupID'] = (df
.groupby([pd.to_datetime(df['Time'], dayfirst=True).dt.isocalendar().week,
'ColC'], sort=False)
.ngroup().add(1)
)
输出:
ColA ColB Time ColC groupID
0 A B 01-01-2022 ABC 1
1 A B 02-01-2022 ABC 1
2 A B 07-01-2022 XYZ 2
3 A B 11-01-2022 IJK 3
4 A B 14-01-2022 ABC 4
下面是我的数据框
ColA ColB Time ColC
A B 01-01-2022 ABC
A B 02-01-2022 ABC
A B 07-01-2022 XYZ
A B 11-01-2022 IJK
A B 14-01-2022 ABC
想要的结果:
ColA ColB Time ColC groupID
A B 01-01-2022 ABC 1
A B 02-01-2022 ABC 1
A B 07-01-2022 XYZ 2
A B 11-01-2022 IJK 3
A B 14-01-2022 ABC 4
更新: 下面是在 cumsum
之后执行的代码df['ColC'] = df['ColC'].ne(df['ColC'].shift(1)).groupby([df['ColA'],
df['ColB']]).cumsum()
ColA ColB Time ColC groupID
A B 01-01-2022 ABC 1
A B 02-01-2022 ABC 1
A B 07-01-2022 XYZ 2
A B 11-01-2022 XYZ 3
A B 14-01-2022 XYZ 4
A B 14-01-2022 XYZ 4
提前致谢
逻辑并不完全清楚,但看起来您正在尝试按周数(和 ColC)分组:
df['groupID'] = (df
.groupby([pd.to_datetime(df['Time'], dayfirst=True).dt.isocalendar().week,
'ColC'], sort=False)
.ngroup().add(1)
)
输出:
ColA ColB Time ColC groupID
0 A B 01-01-2022 ABC 1
1 A B 02-01-2022 ABC 1
2 A B 07-01-2022 XYZ 2
3 A B 11-01-2022 IJK 3
4 A B 14-01-2022 ABC 4