根据不是给定列表中的值的下一列值删除数据框中的列

Dropping a column in a dataframe based on the next column value not being a value in a given list

我有一个数据框,其中诊断代码以 DIAGX 为前缀,诊断类型以 DTYPX 为前缀作为 df 中的下一个 column/variable。这是42诊断code/type对的情况。

我只想包含诊断代码 DIAGX,其中它们的 corresponding/next 诊断类型 DTYPX 在预定义列表 types_to_include.

例如 pat1 有一个我不感兴趣的诊断类型 DTYPX3 = 1 所以我想用 NULL 或空白替换诊断代码 DIAGX3 值所以我以后不包括这个代码。

df_patients

patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1,  'M315', 'Y'),
         ('pat2', 'I099', '3', 'I278', '6', 'M05', 'W', 'F01', 'M'),
         ('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients

Input
patient_num DIAGX1  DTYPX1  DIAGX2  DTYPX2  DIAGX3  DTYPX3  DIAGX4  DTYPX4
pat1        Z509    3       M33     M       M32     1       M315    Y
pat2        I099    3       I278    6       M05     4       F01     2
pat3        N057    3       N057    M       N058    X       N057    X

types_to_include = ['3', 'M', 'W', 'X', 'Y']

Output
patient_num DIAGX1  DTYPX1  DIAGX2  DTYPX2  DIAGX3  DTYPX3  DIAGX4  DTYPX4
pat1        Z509    3       M33     M       NULL    1       M315    Y
pat2        I099    3       NULL    6       NULL    4       NULL    2
pat3        N057    3       N057    M       N058    X       N057    X
patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1,  'M315', 'Y'),
         ('pat2', 'I099', '3', 'I278', '6', 'M05', '4', 'F01', '2'),
         ('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
types_to_include = ['3', 'M', 'W', 'X', 'Y']

# check if types to include are there
m = df_patients.filter(like='DTYPX').isin(types_to_include).values
# filter out the types that aren't there
new = df_patients.filter(like='DIAG').where(m, 'NULL')
# update df
df_patients.update(new)
df_patients