如何根据 ID_number 为列值分配后缀
How to assign a postfix for column values based on ID_number
DataFrame -> 具有 3 列的简单事件日志。
我想根据#applicationnumber 对我的 DataFrame 进行分组(添加 post_fix f.ex _step_1、_step_2 等)。请参阅下面的示例。能否分别帮我解决一下这个tackle?
data_example = {'applicationnumber': ['XYZ104183736AA', 'XYZ104183736AA', 'XDASDHGHG54G', 'XDASDHGHG54G','XDASDHGHG54G'], 'event_name': ['verification', 'verification', 'verification', 'verification','verification'],'working_time_in_seconds': [1000,2000,30000,10000,1004]}
df_example = pd.DataFrame(data_example)
非常感谢!
您可以使用 groupby.cumcount()
将列组合在一起并将它们视为字符串:
df['event_name'] = df['event_name'].astype(str)\
+ "_step_" \
+ df.groupby('applicationnumber').cumcount().add(1).astype(str)
打印:
applicationnumber event_name working_time_in_seconds
0 XYZ104AA verification_step_1 54365
1 XYZ104AA verification_step_2 35453
2 XDA54G verification_step_1 342
3 XDA54G verification_step_2 52
4 XDA54G verification_step_3 123
我用过这个示例DF
:
>>> df.to_dict()
{'applicationnumber': {0: 'XYZ104AA',
1: 'XYZ104AA',
2: 'XDA54G',
3: 'XDA54G',
4: 'XDA54G'},
'event_name': {0: 'verification',
1: 'verification',
2: 'verification',
3: 'verification',
4: 'verification'},
'working_time_in_seconds': {0: 54365, 1: 35453, 2: 342, 3: 52, 4: 123}}
已更新:
import numpy as np
df['event_name'] = np.where(
df.event_name.str.contains('_step_'),df.event_name,\
df['event_name'].astype(str)\
+ "_step_" \
+ df.groupby('applicationnumber').cumcount().add(1).astype(str)
)
DataFrame -> 具有 3 列的简单事件日志。
我想根据#applicationnumber 对我的 DataFrame 进行分组(添加 post_fix f.ex _step_1、_step_2 等)。请参阅下面的示例。能否分别帮我解决一下这个tackle?
data_example = {'applicationnumber': ['XYZ104183736AA', 'XYZ104183736AA', 'XDASDHGHG54G', 'XDASDHGHG54G','XDASDHGHG54G'], 'event_name': ['verification', 'verification', 'verification', 'verification','verification'],'working_time_in_seconds': [1000,2000,30000,10000,1004]}
df_example = pd.DataFrame(data_example)
非常感谢!
您可以使用 groupby.cumcount()
将列组合在一起并将它们视为字符串:
df['event_name'] = df['event_name'].astype(str)\
+ "_step_" \
+ df.groupby('applicationnumber').cumcount().add(1).astype(str)
打印:
applicationnumber event_name working_time_in_seconds
0 XYZ104AA verification_step_1 54365
1 XYZ104AA verification_step_2 35453
2 XDA54G verification_step_1 342
3 XDA54G verification_step_2 52
4 XDA54G verification_step_3 123
我用过这个示例DF
:
>>> df.to_dict()
{'applicationnumber': {0: 'XYZ104AA',
1: 'XYZ104AA',
2: 'XDA54G',
3: 'XDA54G',
4: 'XDA54G'},
'event_name': {0: 'verification',
1: 'verification',
2: 'verification',
3: 'verification',
4: 'verification'},
'working_time_in_seconds': {0: 54365, 1: 35453, 2: 342, 3: 52, 4: 123}}
已更新:
import numpy as np
df['event_name'] = np.where(
df.event_name.str.contains('_step_'),df.event_name,\
df['event_name'].astype(str)\
+ "_step_" \
+ df.groupby('applicationnumber').cumcount().add(1).astype(str)
)