遍历多个列以找到一个值,然后创建一个新列
Iterate over Multiple Columns to Find a Value then Create a New Column
我原来的数据框是这样的
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111']}
df = pd.DataFrame(data)
它呈现了一组患者和三种类型的困难。 '111'表示患者没有任何困难,而其他代码(112,113,114)表示他们有。
我想做的是我想遍历三列以找到至少有一种困难的患者并将结果保存到新列“Difficulty_status" 具有 (yes/no) 个值。
想要的输出如下
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111'], 'Difficulty_status':['yes','no','yes','yes']}
df_output = pd.DataFrame(data)
我目前的成就是这样的
df['Difficylty_status'] = ['yes' if x != '111' else 'no' for x in df['Vision_Difficulty']]
我想概括此代码以检查所有三列(Vision_Difficulty、Hearing_Difficulty、Moving_Difficulty)
使用numpy.where
with test if equal by all columns with Difficulty
in columns names filtered by DataFrame.filter
and DataFrame.eq
, for test if all True
s use DataFrame.all
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').eq('111').all(axis=1),
'no',
'yes')
或使用DataFrame.ne
, for test if at least one True
s use DataFrame.any
并交换yes, no
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').ne('111').any(axis=1),
'yes',
'no')
print (df)
Patient_ID Vision_Difficulty Hearing_Difficulty Moving_Difficulty \
0 A 111 112 111
1 B 111 111 111
2 C 113 112 112
3 D 114 113 111
Difficylty_status
0 yes
1 no
2 yes
3 yes
编辑:如果需要为测试困难指定列名,请使用:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].eq('111').all(axis=1), 'no','yes')
或:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].ne('111').any(axis=1), 'yes','no')
data["difficulty_status"]="NA"
for i in range(len(data)):
if '111' in [data["Vision_Difficulty"][i],data["Hearing_Difficulty"][i],data["Moving_Difficulty"][i]]:
data["difficulty_status"][i]="no"
else:
data["difficulty_status"][i]="yes"
我很确定还有很多其他方法可以做到这一点,但请告诉我这是否有效。
我原来的数据框是这样的
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111']}
df = pd.DataFrame(data)
它呈现了一组患者和三种类型的困难。 '111'表示患者没有任何困难,而其他代码(112,113,114)表示他们有。
我想做的是我想遍历三列以找到至少有一种困难的患者并将结果保存到新列“Difficulty_status" 具有 (yes/no) 个值。
想要的输出如下
data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111'], 'Difficulty_status':['yes','no','yes','yes']}
df_output = pd.DataFrame(data)
我目前的成就是这样的
df['Difficylty_status'] = ['yes' if x != '111' else 'no' for x in df['Vision_Difficulty']]
我想概括此代码以检查所有三列(Vision_Difficulty、Hearing_Difficulty、Moving_Difficulty)
使用numpy.where
with test if equal by all columns with Difficulty
in columns names filtered by DataFrame.filter
and DataFrame.eq
, for test if all True
s use DataFrame.all
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').eq('111').all(axis=1),
'no',
'yes')
或使用DataFrame.ne
, for test if at least one True
s use DataFrame.any
并交换yes, no
:
df['Difficulty_status'] = np.where(df.filter(like='Difficulty').ne('111').any(axis=1),
'yes',
'no')
print (df)
Patient_ID Vision_Difficulty Hearing_Difficulty Moving_Difficulty \
0 A 111 112 111
1 B 111 111 111
2 C 113 112 112
3 D 114 113 111
Difficylty_status
0 yes
1 no
2 yes
3 yes
编辑:如果需要为测试困难指定列名,请使用:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].eq('111').all(axis=1), 'no','yes')
或:
cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].ne('111').any(axis=1), 'yes','no')
data["difficulty_status"]="NA"
for i in range(len(data)):
if '111' in [data["Vision_Difficulty"][i],data["Hearing_Difficulty"][i],data["Moving_Difficulty"][i]]:
data["difficulty_status"][i]="no"
else:
data["difficulty_status"][i]="yes"
我很确定还有很多其他方法可以做到这一点,但请告诉我这是否有效。