根据不是给定列表中的值的下一列值删除数据框中的列
Dropping a column in a dataframe based on the next column value not being a value in a given list
我有一个数据框,其中诊断代码以 DIAGX
为前缀,诊断类型以 DTYPX
为前缀作为 df 中的下一个 column/variable。这是42诊断code/type对的情况。
我只想包含诊断代码 DIAGX
,其中它们的 corresponding/next 诊断类型 DTYPX
在预定义列表 types_to_include
.
中
例如 pat1 有一个我不感兴趣的诊断类型 DTYPX3 = 1
所以我想用 NULL 或空白替换诊断代码 DIAGX3
值所以我以后不包括这个代码。
df_patients
patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1, 'M315', 'Y'),
('pat2', 'I099', '3', 'I278', '6', 'M05', 'W', 'F01', 'M'),
('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients
Input
patient_num DIAGX1 DTYPX1 DIAGX2 DTYPX2 DIAGX3 DTYPX3 DIAGX4 DTYPX4
pat1 Z509 3 M33 M M32 1 M315 Y
pat2 I099 3 I278 6 M05 4 F01 2
pat3 N057 3 N057 M N058 X N057 X
types_to_include = ['3', 'M', 'W', 'X', 'Y']
Output
patient_num DIAGX1 DTYPX1 DIAGX2 DTYPX2 DIAGX3 DTYPX3 DIAGX4 DTYPX4
pat1 Z509 3 M33 M NULL 1 M315 Y
pat2 I099 3 NULL 6 NULL 4 NULL 2
pat3 N057 3 N057 M N058 X N057 X
patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1, 'M315', 'Y'),
('pat2', 'I099', '3', 'I278', '6', 'M05', '4', 'F01', '2'),
('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
types_to_include = ['3', 'M', 'W', 'X', 'Y']
# check if types to include are there
m = df_patients.filter(like='DTYPX').isin(types_to_include).values
# filter out the types that aren't there
new = df_patients.filter(like='DIAG').where(m, 'NULL')
# update df
df_patients.update(new)
df_patients
我有一个数据框,其中诊断代码以 DIAGX
为前缀,诊断类型以 DTYPX
为前缀作为 df 中的下一个 column/variable。这是42诊断code/type对的情况。
我只想包含诊断代码 DIAGX
,其中它们的 corresponding/next 诊断类型 DTYPX
在预定义列表 types_to_include
.
例如 pat1 有一个我不感兴趣的诊断类型 DTYPX3 = 1
所以我想用 NULL 或空白替换诊断代码 DIAGX3
值所以我以后不包括这个代码。
df_patients
patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1, 'M315', 'Y'),
('pat2', 'I099', '3', 'I278', '6', 'M05', 'W', 'F01', 'M'),
('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients
Input
patient_num DIAGX1 DTYPX1 DIAGX2 DTYPX2 DIAGX3 DTYPX3 DIAGX4 DTYPX4
pat1 Z509 3 M33 M M32 1 M315 Y
pat2 I099 3 I278 6 M05 4 F01 2
pat3 N057 3 N057 M N058 X N057 X
types_to_include = ['3', 'M', 'W', 'X', 'Y']
Output
patient_num DIAGX1 DTYPX1 DIAGX2 DTYPX2 DIAGX3 DTYPX3 DIAGX4 DTYPX4
pat1 Z509 3 M33 M NULL 1 M315 Y
pat2 I099 3 NULL 6 NULL 4 NULL 2
pat3 N057 3 N057 M N058 X N057 X
patients = [('pat1', 'Z509', '3', 'M33', 'M', 'M32', 1, 'M315', 'Y'),
('pat2', 'I099', '3', 'I278', '6', 'M05', '4', 'F01', '2'),
('pat3', 'N057', '3', 'N057', 'M', 'N058', 'X', 'N057', 'X')]
labels = ['patient_num', 'DIAGX1', 'DTYPX1', 'DIAGX2', 'DTYPX2', 'DIAGX3', 'DTYPX3', 'DIAGX4', 'DTYPX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
types_to_include = ['3', 'M', 'W', 'X', 'Y']
# check if types to include are there
m = df_patients.filter(like='DTYPX').isin(types_to_include).values
# filter out the types that aren't there
new = df_patients.filter(like='DIAG').where(m, 'NULL')
# update df
df_patients.update(new)
df_patients