遍历多个列以找到一个值,然后创建一个新列

Iterate over Multiple Columns to Find a Value then Create a New Column

我原来的数据框是这样的

data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111']}  

df = pd.DataFrame(data)

它呈现了一组患者和三种类型的困难。 '111'表示患者没有任何困难,而其他代码(112,113,114)表示他们有。

我想做的是我想遍历三列以找到至少有一种困难的患者并将结果保存到新列“Difficulty_status" 具有 (yes/no) 个值。

想要的输出如下

data = {'Patient_ID': ['A', 'B', 'C', 'D'], 'Vision_Difficulty': ['111', '111', '113', '114'],'Hearing_Difficulty': ['111', '111', '113', '114'], 'Hearing_Difficulty': ['112', '111', '112', '113'],'Moving_Difficulty': ['111', '111', '112', '111'], 'Difficulty_status':['yes','no','yes','yes']}

df_output = pd.DataFrame(data)

我目前的成就是这样的

df['Difficylty_status'] = ['yes' if x != '111' else 'no' for x in df['Vision_Difficulty']]

我想概括此代码以检查所有三列(Vision_Difficulty、Hearing_Difficulty、Moving_Difficulty)

使用numpy.where with test if equal by all columns with Difficulty in columns names filtered by DataFrame.filter and DataFrame.eq, for test if all Trues use DataFrame.all:

df['Difficulty_status'] = np.where(df.filter(like='Difficulty').eq('111').all(axis=1), 
                                   'no',
                                   'yes')

或使用DataFrame.ne, for test if at least one Trues use DataFrame.any并交换yes, no:

df['Difficulty_status'] = np.where(df.filter(like='Difficulty').ne('111').any(axis=1),
                                    'yes',
                                    'no')

print (df)
  Patient_ID Vision_Difficulty Hearing_Difficulty Moving_Difficulty  \
0          A               111                112               111   
1          B               111                111               111   
2          C               113                112               112   
3          D               114                113               111   

  Difficylty_status  
0               yes  
1                no  
2               yes  
3               yes  

编辑:如果需要为测试困难指定列名,请使用:

cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].eq('111').all(axis=1), 'no','yes')

或:

cols = ['Vision_Difficulty','Hearing_Difficulty','Moving_Difficulty']
df['Difficulty_status'] = np.where(df[cols].ne('111').any(axis=1), 'yes','no')
data["difficulty_status"]="NA"
for i in range(len(data)):
    if '111' in [data["Vision_Difficulty"][i],data["Hearing_Difficulty"][i],data["Moving_Difficulty"][i]]:
        data["difficulty_status"][i]="no"
    else:
        data["difficulty_status"][i]="yes"

我很确定还有很多其他方法可以做到这一点,但请告诉我这是否有效。