检查列表变量时如何检查特定列的值并分配加权整数值
How to check specific columns for values and assign weighted integer values when checking against variables of lists
我有一个包含患者诊断列 (DIAGX1-DIAGX42) 的数据集,我需要创建一个变量,根据外部索引的权重对这些列的值求和。
df_patients
patients = [('pat1', 'Z509', 'M33', 'M32', 'M315'),
('pat2', 'I099', 'I278', 'M05', 'F01'),
('pat3', 'N057', 'N057', 'N058', 'N057')]
labels = ['patient_num', 'DIAGX1', 'DIAGX2', 'DIAGX3', 'DIAGX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients
Input
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4
pat1 Z509 M33 M32 M315
pat2 I099 I278 M05 F01
pat3 N057 N057 N058 N057
Output
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
pat1 Z509 M33 M32 M315 1
pat2 I099 I278 M05 F01 6
pat3 N057 N057 N058 N057 0
external_index,其中如果上方数据集中的列包含以下任一列中的值,则将添加该值。只有一个成员贡献了一个值,例如 F01
、F02
都在 dementia
中的值只会导致 2
被分配给那个 record/patient,如果它们出现在分组索引中,则值仅为 added/summed,例如F01
=2 和 I099
=1 总和为 3
- congestive_heart_failure = 2
- 痴呆症 = 2
- chronic_pulmonary_disease=1
- rheumatologic_disease = 1
congestive_heart_failure = [
"I099",
"I255",
"I420",
"I425",
"I426",
"I427",
"I428",
"I429",
"I43",
"I50",
"P290",
]
dementia = ["F01", "F02", "F03", "F051", "G30", "G311"]
chronic_pulmonary_disease = [
"I278",
"I279",
"J40",
"J41",
"J42",
"J43",
"J44",
"J45",
"J47",
"J60",
"J61",
"J62",
"J63",
"J64",
"J65",
"J66",
"J67",
"J684",
"J701",
"J703",
]
rheumatologic_disease = [
"M05",
"M06",
"M315",
"M32",
"M33",
"M34",
"M351",
"M353",
"M360",
]
您可以这样做,也可以采用其他一些更有效的方法:
chf_dict = dict(zip(congestive_heart_failure,['chf']*len(congestive_heart_failure)))
dementia_dict = dict(zip(dementia,['dem']*len(dementia)))
cpd_dict = dict(zip(chronic_pulmonary_disease,['cpd']*len(chronic_pulmonary_disease)))
rd_dict = dict(zip(rheumatologic_disease,['rd']*len(rheumatologic_disease)))
disease_map = chf_dict
disease_map.update(dementia_dict)
disease_map.update(cpd_dict)
disease_map.update(rd_dict)
score_dict = {'cpd':1,
'chf':2,
'rd':1,
'dem':2}
score_df = df_patients.set_index('patient_num').stack().map(disease_map)\
.droplevel(1).reset_index(name='disease')\
.drop_duplicates().set_index('patient_num')['disease']\
.map(score_dict)\
.groupby(level=0).sum().rename('Score')
df_patients.merge(score_df, left_on='patient_num', right_index=True)
输出:
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
0 pat1 Z509 M33 M32 M315 1.0
1 pat2 I099 I278 M05 F01 6.0
2 pat3 N057 N057 N058 N057 0.0
注释代码
idx = {
'dementia': dementia,
'rheumatologic_disease': rheumatologic_disease,
'congestive_heart_failure': congestive_heart_failure,
'chronic_pulmonary_disease': chronic_pulmonary_disease,
}
mapping = {v: k for k, vals in idx.items() for v in vals}
weights = {
'dementia': 2,
'rheumatologic_disease': 1,
'congestive_heart_failure': 2,
'chronic_pulmonary_disease': 1,
}
# Convert the dataframe into long format
df = df_patients.melt('patient_num')
# Substitute disease name inplace of codes
df['value'] = df['value'].map(mapping)
# Drop dupes per patient and disease
df = df.drop_duplicates(['patient_num', 'value'])
# Map the weights assigned to diseases
df['value'] = df['value'].map(weights)
# Sum the weights per patient and map it back to original dataframe
df_patients['Score'] = df['patient_num'].map(df.groupby('patient_num')['value'].sum())
结果
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
0 pat1 Z509 M33 M32 M315 1.0
1 pat2 I099 I278 M05 F01 6.0
2 pat3 N057 N057 N058 N057 0.0
我有一个包含患者诊断列 (DIAGX1-DIAGX42) 的数据集,我需要创建一个变量,根据外部索引的权重对这些列的值求和。
df_patients
patients = [('pat1', 'Z509', 'M33', 'M32', 'M315'),
('pat2', 'I099', 'I278', 'M05', 'F01'),
('pat3', 'N057', 'N057', 'N058', 'N057')]
labels = ['patient_num', 'DIAGX1', 'DIAGX2', 'DIAGX3', 'DIAGX4']
df_patients = pd.DataFrame.from_records(patients, columns=labels)
df_patients
Input
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4
pat1 Z509 M33 M32 M315
pat2 I099 I278 M05 F01
pat3 N057 N057 N058 N057
Output
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
pat1 Z509 M33 M32 M315 1
pat2 I099 I278 M05 F01 6
pat3 N057 N057 N058 N057 0
external_index,其中如果上方数据集中的列包含以下任一列中的值,则将添加该值。只有一个成员贡献了一个值,例如 F01
、F02
都在 dementia
中的值只会导致 2
被分配给那个 record/patient,如果它们出现在分组索引中,则值仅为 added/summed,例如F01
=2 和 I099
=1 总和为 3
- congestive_heart_failure = 2
- 痴呆症 = 2
- chronic_pulmonary_disease=1
- rheumatologic_disease = 1
congestive_heart_failure = [
"I099",
"I255",
"I420",
"I425",
"I426",
"I427",
"I428",
"I429",
"I43",
"I50",
"P290",
]
dementia = ["F01", "F02", "F03", "F051", "G30", "G311"]
chronic_pulmonary_disease = [
"I278",
"I279",
"J40",
"J41",
"J42",
"J43",
"J44",
"J45",
"J47",
"J60",
"J61",
"J62",
"J63",
"J64",
"J65",
"J66",
"J67",
"J684",
"J701",
"J703",
]
rheumatologic_disease = [
"M05",
"M06",
"M315",
"M32",
"M33",
"M34",
"M351",
"M353",
"M360",
]
您可以这样做,也可以采用其他一些更有效的方法:
chf_dict = dict(zip(congestive_heart_failure,['chf']*len(congestive_heart_failure)))
dementia_dict = dict(zip(dementia,['dem']*len(dementia)))
cpd_dict = dict(zip(chronic_pulmonary_disease,['cpd']*len(chronic_pulmonary_disease)))
rd_dict = dict(zip(rheumatologic_disease,['rd']*len(rheumatologic_disease)))
disease_map = chf_dict
disease_map.update(dementia_dict)
disease_map.update(cpd_dict)
disease_map.update(rd_dict)
score_dict = {'cpd':1,
'chf':2,
'rd':1,
'dem':2}
score_df = df_patients.set_index('patient_num').stack().map(disease_map)\
.droplevel(1).reset_index(name='disease')\
.drop_duplicates().set_index('patient_num')['disease']\
.map(score_dict)\
.groupby(level=0).sum().rename('Score')
df_patients.merge(score_df, left_on='patient_num', right_index=True)
输出:
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
0 pat1 Z509 M33 M32 M315 1.0
1 pat2 I099 I278 M05 F01 6.0
2 pat3 N057 N057 N058 N057 0.0
注释代码
idx = {
'dementia': dementia,
'rheumatologic_disease': rheumatologic_disease,
'congestive_heart_failure': congestive_heart_failure,
'chronic_pulmonary_disease': chronic_pulmonary_disease,
}
mapping = {v: k for k, vals in idx.items() for v in vals}
weights = {
'dementia': 2,
'rheumatologic_disease': 1,
'congestive_heart_failure': 2,
'chronic_pulmonary_disease': 1,
}
# Convert the dataframe into long format
df = df_patients.melt('patient_num')
# Substitute disease name inplace of codes
df['value'] = df['value'].map(mapping)
# Drop dupes per patient and disease
df = df.drop_duplicates(['patient_num', 'value'])
# Map the weights assigned to diseases
df['value'] = df['value'].map(weights)
# Sum the weights per patient and map it back to original dataframe
df_patients['Score'] = df['patient_num'].map(df.groupby('patient_num')['value'].sum())
结果
patient_num DIAGX1 DIAGX2 DIAGX3 DIAGX4 Score
0 pat1 Z509 M33 M32 M315 1.0
1 pat2 I099 I278 M05 F01 6.0
2 pat3 N057 N057 N058 N057 0.0