我如何使用医疗代码来确定一个人使用 jupyter 患有什么疾病？

Question

我目前正在尝试使用一些医疗代码来查明一个人是否患有某种疾病并且需要帮助，因为我尝试搜索了几天但没有找到任何帮助。希望有人能帮我解决这个问题。考虑到我已经将 excel 文件 1 导入 df1 并将 excel 文件 2 导入 df2，我如何使用 excel 文件 2 来识别 excel 文件中的患者患有什么疾病1 有并用 header 表示？下面是数据的外观示例。我目前正在为此使用 pandas Jupyter notebook。

Excel 文件 1:

Patient	Primary Diagnosis	Secondary Diagnosis	Secondary Diagnosis 2	Secondary Diagnosis 3
Alex	50322	50111
John	50331	60874	50226	74444
Peter	50226	74444
Peter	50233	88888

Excel 文件 2:

Primary Diagnosis	Medical Code
Diabetes Type 2	50322
Diabetes Type 2	50331
Diabetes Type 2	50233
Cardiovescular Disease	50226
Hypertension	50111
AIDS	60874
HIV	74444
HIV	88888

预期输出：

Patient	Positive for Diabetes Type 2	Positive for Cardiovascular Disease	Positive for AIDS	Positive for HIV
Alex	1	1	0	0
John	1	1	1	1
Peter	1	1	0	1

Answer 1

也许您可以将 excel 文件 2 转换为某种形式的键值对，然后将文件 1 中的主要诊断列替换为相应的疾病名称，稍后应用某种形式的编码，例如 one-hot 或类似于文件 1 的内容。不确定这种方法是否一定有帮助，但只是分享我的想法。

Answer 2

IIUC，你可以融化 df1，然后映射重塑 df2 的代码，最后 pivot_table 输出：

diseases = df2.set_index('Medical Code')['Primary Diagnosis']

(df1
 .reset_index()
 .melt(id_vars=['index', 'Patient'])
 .assign(disease=lambda d: d['value'].map(diseases),
         value=1,
        )
 .pivot_table(index='Patient', columns='disease', values='value', fill_value=0)
)

输出：

disease  AIDS  Cardiovescular Disease  Diabetes Type 2  HIV  Hypertension
Patient                                                                  
Alex        0                       0                1    0             1
John        1                       1                1    1             0
Peter       0                       1                1    1             0

Answer 3

您可以使用 merge 和 pivot_table

out = (
    df1.melt('Patient', var_name='Diagnosis', value_name='Medical Code').dropna()
       .merge(df2, on='Medical Code').assign(dummy=1)
       .pivot_table('dummy', 'Patient', 'Primary Diagnosis', fill_value=0)
       .add_prefix('Positive for ').rename_axis(columns=None).reset_index()
)

输出：

Patient	Positive for AIDS	Positive for Cardiovescular Disease	Positive for Diabetes Type 2	Positive for HIV	Positive for Hypertension
Alex	0	0	1	0	1
John	1	1	1	1	0
Peter	0	1	1	1	0

我如何使用医疗代码来确定一个人使用 jupyter 患有什么疾病？

How do i use medical codes to determine what disease a person have using jupyter?

python

excel

pandas

jupyter

jupyter-notebook