Python 数据框:基于字符串列和浮点列中的值创建新列
Python Data Frame: Create New Column Based on Values in a String Column and a Float Column
我下面有以下 Python 数据框。 "Flag" 字段是我想要用代码创建的列。
我想执行以下操作:
如果预测 "Allocation Type" 并且 "Activities_Counter" 大于 10,我想创建一个名为 "Flag" 的新列并用 'Flag' [=13] 标记该行=]
否则,将标志行留空。
我使用以下代码来识别/标记 "Activities_Counter" 大于 10 的位置...但是我不知道如何将 "Allocation Type" 标准合并到我的代码中。
Flag = []
for row in df_HA_noHA_act['Activities_Counter']:
if row >= 10:
Flag.append('Flag')
else:
Flag.append('')
df_HA_noHA_act['Flag'] = Flag
非常感谢任何帮助!
您需要使用 &
添加新条件。使用 numpy.where
:
也更快
mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') &
(df_HA_noHA_act['Activities_Counter'] >= 10)
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '')
df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18],
'Allocation Type':['Historical','Historical','Predicted',
'Predicted','Predicted','Historical']})
print (df_HA_noHA_act)
Activities_Counter Allocation Type
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted
4 11 Predicted
5 18 Historical
mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') &
(df_HA_noHA_act['Activities_Counter'] >= 10)
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '')
print (df_HA_noHA_act)
Activities_Counter Allocation Type Flag
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted Flag
4 11 Predicted Flag
5 18 Historical
循环慢的解决方法:
Flag = []
for i, row in df_HA_noHA_act.iterrows():
if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'):
Flag.append('Flag')
else:
Flag.append('')
df_HA_noHA_act['Flag'] = Flag
print (df_HA_noHA_act)
Activities_Counter Allocation Type Flag
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted Flag
4 11 Predicted Flag
5 18 Historical
时间:
df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18],
'Allocation Type':['Historical','Historical','Predicted',
'Predicted','Predicted','Historical']})
print (df_HA_noHA_act)
#[6000 rows x 2 columns]
df_HA_noHA_act = pd.concat([df_HA_noHA_act]*1000).reset_index(drop=True)
In [187]: %%timeit
...: df_HA_noHA_act['Flag1'] = np.where((df_HA_noHA_act["Allocation Type"] == 'Predicted') & (df_HA_noHA_act['Activities_Counter'] >= 10), 'Flag', '')
...:
100 loops, best of 3: 1.89 ms per loop
In [188]: %%timeit
...: Flag = []
...: for i, row in df_HA_noHA_act.iterrows():
...: if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'):
...: Flag.append('Flag')
...: else:
...: Flag.append('')
...: df_HA_noHA_act['Flag'] = Flag
...:
...:
1 loop, best of 3: 381 ms per loop
我下面有以下 Python 数据框。 "Flag" 字段是我想要用代码创建的列。
我想执行以下操作:
如果预测 "Allocation Type" 并且 "Activities_Counter" 大于 10,我想创建一个名为 "Flag" 的新列并用 'Flag' [=13] 标记该行=]
否则,将标志行留空。
我使用以下代码来识别/标记 "Activities_Counter" 大于 10 的位置...但是我不知道如何将 "Allocation Type" 标准合并到我的代码中。
Flag = []
for row in df_HA_noHA_act['Activities_Counter']:
if row >= 10:
Flag.append('Flag')
else:
Flag.append('')
df_HA_noHA_act['Flag'] = Flag
非常感谢任何帮助!
您需要使用 &
添加新条件。使用 numpy.where
:
mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') &
(df_HA_noHA_act['Activities_Counter'] >= 10)
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '')
df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18],
'Allocation Type':['Historical','Historical','Predicted',
'Predicted','Predicted','Historical']})
print (df_HA_noHA_act)
Activities_Counter Allocation Type
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted
4 11 Predicted
5 18 Historical
mask = (df_HA_noHA_act["Allocation Type"] == 'Predicted') &
(df_HA_noHA_act['Activities_Counter'] >= 10)
df_HA_noHA_act['Flag'] = np.where(mask, 'Flag', '')
print (df_HA_noHA_act)
Activities_Counter Allocation Type Flag
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted Flag
4 11 Predicted Flag
5 18 Historical
循环慢的解决方法:
Flag = []
for i, row in df_HA_noHA_act.iterrows():
if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'):
Flag.append('Flag')
else:
Flag.append('')
df_HA_noHA_act['Flag'] = Flag
print (df_HA_noHA_act)
Activities_Counter Allocation Type Flag
0 10 Historical
1 2 Historical
2 6 Predicted
3 15 Predicted Flag
4 11 Predicted Flag
5 18 Historical
时间:
df_HA_noHA_act = pd.DataFrame({'Activities_Counter':[10,2,6,15,11,18],
'Allocation Type':['Historical','Historical','Predicted',
'Predicted','Predicted','Historical']})
print (df_HA_noHA_act)
#[6000 rows x 2 columns]
df_HA_noHA_act = pd.concat([df_HA_noHA_act]*1000).reset_index(drop=True)
In [187]: %%timeit
...: df_HA_noHA_act['Flag1'] = np.where((df_HA_noHA_act["Allocation Type"] == 'Predicted') & (df_HA_noHA_act['Activities_Counter'] >= 10), 'Flag', '')
...:
100 loops, best of 3: 1.89 ms per loop
In [188]: %%timeit
...: Flag = []
...: for i, row in df_HA_noHA_act.iterrows():
...: if (row['Activities_Counter'] >= 10) and (row["Allocation Type"] == 'Predicted'):
...: Flag.append('Flag')
...: else:
...: Flag.append('')
...: df_HA_noHA_act['Flag'] = Flag
...:
...:
1 loop, best of 3: 381 ms per loop