如何将 R 中的 2 个以上数字类别分配给单个响应?

How do I assign more than 2 numerical categories in R to single response?

我对 R 很陌生。我有一个医院数据集,其中根据诊断为患者分配类别。例如

以此类推,共8种疾病。这里的“/”代表存在一种以上的疾病。我想按数字对它们进行分类,例如 dis A=1、dis B=2 等等。以上数据需为:

我已经用 sapply 尝试过,作为水平的一个因素,但我能得到的最好的是只有单一疾病的正确分类。组合疾病返回 NULL 值。有没有办法做到这一点?请帮忙!

这是一个示例:

structure(list(Classification = c("IHD/other/cardiopulmonary", 
"IHD", "hypertensive", "IHD/other", "IHD/other", "IHD/other/CVA"
), Comorbidities = c("DM", "HT+DM", "HT+DM", NA, NA, "HT+DM"), 
    Diagnosis = c("CORONARY ARTERY DISEASE WITH MITRAL REGURGITATION WITH TRICUSPID REGURGITATION WITH PULOMNARY HYPERTENSION WITH DYSFUNCTION LEFT VENTRICLE WITH DIABETES MELLITUS", 
    "ACUTE CORONARY SYNDROME WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CARDIOGENIC SHOCK WITH BLEEDING DIATHESIS WITH DIABETES MELLITUS WITH HYPERTENSION", 
    "ASPIRATION PNEUMONTIS WITH RESPIRATORY FALIURE WITH HYPERTENSION WITH HYPONATERMIA WITH DIABETES MELLITUS", 
    "ACUTE CORONARY SYNDROME WITH RIGHT BUNDLE BRANCH BLOCK WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CARDIOGENIC SHOCK", 
    "COMPLETE HEART BLOCK WITH CARDIAC ARREST WITH INTERIOR WALL MYOCARDIAL INFARCTION", 
    "DIABETES MELLITUS WITH CORONARY ARTERY DISEASE WITH HYPERTENSION SYSTEMIC WITH ATRIAL FIBRILATION WITH PULMONARY TUBERCULOSIS WITH CEREBRO VASCULAR ACCIDENT WITH CARDIOGENIC SHOCK"
    )), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

有效:

stringr::str_replace_all(string = c("1 dis A/dis C 2 dis B 3 dis A/dis B/dis C", "dis A/dis B"), pattern = c('dis A' = '1', 'dis B' = '2','dis C' = '3'))
[1] "1 1/3 2 2 3 1/2/3" "1/2"    

更新

使用示例数据:

stringr::str_replace_all(string = df$Classification, pattern = c('IHD' = '1', 'other' = '2','cardiopulmonary' = '3', 'hypertensive' = '4', 'CVA'='5'))
[1] "1/2/3" "1"     "4"     "1/2"   "1/2"   "1/2/5"

所以,为了更新您的数据,您可以这样做:

df$Classification <- stringr::str_replace_all(string = df$Classification, pattern = c('IHD' = '1', 'other' = '2','cardiopulmonary' = '3', 'hypertensive' = '4', 'CVA'='5'))

来自 qdap

mgsub 的简单子集
dis <- c('dis A','dis B','dis C','dis D','dis E','dis F','dis G','dis H')
class <- 1:8

library(dplyr)
library(qdap)

dt %>% 
  mutate(`Disease classification` = mgsub(dis,class,`Disease classification`))

# dt %>% 
#   mutate(`NEW Disease classification` = mgsub(dis,class,`Disease classification`))

这是一个基本的 R 选项,它应该适用于任何数量的疾病,而无需手动为它们指定一个数字。

#split the string on '/'
split_vals <- strsplit(df$Classification, '/')
#Get the unique values
all_vals <- unique(unlist(split_vals))
#Use match to get a unique number for each value.
df$Classification <- sapply(split_vals, function(x) 
                            paste(match(x, all_vals),collapse = '/'))
df

# Classification Comorbidities Diagnosis                                                                 
#  <chr>          <chr>         <chr>                                                                     
#1 1/2/3          DM            CORONARY ARTERY DISEASE WITH MITRAL REGURGITATION WITH TRICUSPID REGURGIT…
#2 1              HT+DM         ACUTE CORONARY SYNDROME WITH ANTERIOR WALL MYOCARDIAL INFARCTION WITH CAR…
#3 4              HT+DM         ASPIRATION PNEUMONTIS WITH RESPIRATORY FALIURE WITH HYPERTENSION WITH HYP…
#4 1/2            NA            ACUTE CORONARY SYNDROME WITH RIGHT BUNDLE BRANCH BLOCK WITH ANTERIOR WALL…
#5 1/2            NA            COMPLETE HEART BLOCK WITH CARDIAC ARREST WITH INTERIOR WALL MYOCARDIAL IN…
#6 1/2/5          HT+DM         DIABETES MELLITUS WITH CORONARY ARTERY DISEASE WITH HYPERTENSION SYSTEMIC…