枚举列中的值
Enumerate values in a column
最小可重现示例:
df = pd.DataFrame({'event_name': ['fulham','fulham','fulham','fulham','fulham','fulham'],
'batfast_id': ['bfs1', 'bfs1', 'bfs1', 'bfs1', 'bfs1', 'bfs1'],
'session_no': [1,1,1,1,1,1],
'overs': [0,0,0,0,0,0],
'deliveries_faced': [0,1,2,3,4,5],
'delivery_type': ['Extra Slow Leg Spin','Extra Slow Leg Spin','Slow Straight','Extra Slow Off Spin','Extra Slow Leg Spin','Extra Slow Leg Spin'],
'length': ['Yorker','Yorker','Yorker','Yorker','Yorker','Yorker']}, columns=['event_name', 'batfast_id','session_no','overs', 'deliveries_faced','delivery_type','length'])
df = df.set_index(['event_name', 'batfast_id','session_no','overs', 'deliveries_faced'],drop=True)
print(df)
然后我使用以下代码生成一个 length/type
列,它是 length
和 delivery_type
的组合:
conditions = [
(df['delivery_type'] == 'Extra Slow Off Spin') & (df['length'] == 'Yorker'),
(df['delivery_type'] == 'Extra Slow Leg Spin') & (df['length'] == 'Yorker'),
(df['delivery_type'] == 'Slow Straight') & (df['length'] == 'Yorker'),
]
values = ['ES_OS_Y', 'ES_LS_Y','S_S_Y']
df['length/type'] = np.select(conditions, values)
print(df)
问题是我希望从 0-5 枚举每次交付的每次交付,这样它看起来像这样:
delivery_type length length/type
event_name batfast_id session_no overs deliveries_faced
fulham bfs1 1 0 0 Extra Slow Leg Spin Yorker ES_LS_Y_0
1 Extra Slow Leg Spin Yorker ES_LS_Y_1
2 Slow Straight Yorker S_S_Y_2
3 Extra Slow Off Spin Yorker ES_OS_Y_3
4 Extra Slow Leg Spin Yorker ES_LS_Y_4
5 Extra Slow Leg Spin Yorker ES_LS_Y_5
尝试:
df['length/type'] = df['length/type'] + '_' \
+ df.groupby(df.index.names[:-1]).cumcount().astype(str)
print(df)
# Output:
delivery_type length length/type
event_name batfast_id session_no overs deliveries_faced
fulham bfs1 1 0 0 Extra Slow Leg Spin Yorker ES_LS_Y_0
1 Extra Slow Leg Spin Yorker ES_LS_Y_1
2 Slow Straight Yorker S_S_Y_2
3 Extra Slow Off Spin Yorker ES_OS_Y_3
4 Extra Slow Leg Spin Yorker ES_LS_Y_4
5 Extra Slow Leg Spin Yorker ES_LS_Y_5
最小可重现示例:
df = pd.DataFrame({'event_name': ['fulham','fulham','fulham','fulham','fulham','fulham'],
'batfast_id': ['bfs1', 'bfs1', 'bfs1', 'bfs1', 'bfs1', 'bfs1'],
'session_no': [1,1,1,1,1,1],
'overs': [0,0,0,0,0,0],
'deliveries_faced': [0,1,2,3,4,5],
'delivery_type': ['Extra Slow Leg Spin','Extra Slow Leg Spin','Slow Straight','Extra Slow Off Spin','Extra Slow Leg Spin','Extra Slow Leg Spin'],
'length': ['Yorker','Yorker','Yorker','Yorker','Yorker','Yorker']}, columns=['event_name', 'batfast_id','session_no','overs', 'deliveries_faced','delivery_type','length'])
df = df.set_index(['event_name', 'batfast_id','session_no','overs', 'deliveries_faced'],drop=True)
print(df)
然后我使用以下代码生成一个 length/type
列,它是 length
和 delivery_type
的组合:
conditions = [
(df['delivery_type'] == 'Extra Slow Off Spin') & (df['length'] == 'Yorker'),
(df['delivery_type'] == 'Extra Slow Leg Spin') & (df['length'] == 'Yorker'),
(df['delivery_type'] == 'Slow Straight') & (df['length'] == 'Yorker'),
]
values = ['ES_OS_Y', 'ES_LS_Y','S_S_Y']
df['length/type'] = np.select(conditions, values)
print(df)
问题是我希望从 0-5 枚举每次交付的每次交付,这样它看起来像这样:
delivery_type length length/type
event_name batfast_id session_no overs deliveries_faced
fulham bfs1 1 0 0 Extra Slow Leg Spin Yorker ES_LS_Y_0
1 Extra Slow Leg Spin Yorker ES_LS_Y_1
2 Slow Straight Yorker S_S_Y_2
3 Extra Slow Off Spin Yorker ES_OS_Y_3
4 Extra Slow Leg Spin Yorker ES_LS_Y_4
5 Extra Slow Leg Spin Yorker ES_LS_Y_5
尝试:
df['length/type'] = df['length/type'] + '_' \
+ df.groupby(df.index.names[:-1]).cumcount().astype(str)
print(df)
# Output:
delivery_type length length/type
event_name batfast_id session_no overs deliveries_faced
fulham bfs1 1 0 0 Extra Slow Leg Spin Yorker ES_LS_Y_0
1 Extra Slow Leg Spin Yorker ES_LS_Y_1
2 Slow Straight Yorker S_S_Y_2
3 Extra Slow Off Spin Yorker ES_OS_Y_3
4 Extra Slow Leg Spin Yorker ES_LS_Y_4
5 Extra Slow Leg Spin Yorker ES_LS_Y_5