基于R中另一个字段中字符的同一键的多个记录的二进制变量

Question

我有 table 次就诊，其中如果有多个诊断，有时同一个就诊键 (Enc_Key) 会有多个记录，例如：

Enc_Key | Patient_Key |   Enc_Date   | Diag
  123         789         20160512      asthma
  123         789         20160512      fever
  123         789         20160515      coughing
  546         013         20160226      flu      
  564         444         20160707      laceration
  789         226         20160707      asthma
  789         226         20160707      fever

我正在尝试根据字符变量 Diag 的值创建一个指示变量 Diag_Ind，但我需要在整个遭遇中应用它。换句话说，如果我得到 "asthma" 的 Diag 记录的值，那么我想将 Diag_Ind 的“1”应用于具有相同 [=] 的每条记录14=]，如下图：

Enc_Key | Patient_Key |   Enc_Date   | Diag            | Diag_Ind
  123         789         20160512      asthma             1
  123         789         20160512      fever              1
  123         789         20160515      coughing           1
  546         013         20160226      flu                0     
  564         444         20160707      laceration         0
  789         226         20160707      asthma attack      1
  789         226         20160707      fever              1

我似乎想不出将此二进制指示器应用于多条记录的方法。我一直在使用类似于这样的代码行：

tbl$Diag_Ind <- ifelse(grepl('asthma',tolower(tbl$Diag)),1,0)

但这只会将值“1”分配给具有 Diag 值的单个记录，例如：

Enc_Key | Patient_Key |   Enc_Date   | Diag            | Diag_Ind
  123         789         20160512      asthma             1
  123         789         20160512      fever              0
  123         789         20160515      coughing           0
  546         013         20160226      flu                0     
  564         444         20160707      laceration         0
  789         226         20160707      asthma attack      1
  789         226         20160707      fever              0

我不确定如何将它应用到具有相同 Enc_Key 值的其余记录

Answer 1

我们可以使用基数 R ave 来检查 Enc_Key 的每组中是否有任何值包含 asthma

df$Diag_Ind<- ave(df$Diag, df$Enc_Key,FUN=function(x) as.integer(any(grep("asthma", x))))

df
#  Enc_Key Patient_Key Enc_Date       Diag Diag_Ind
#1     123         789 20160512     asthma        1
#2     123         789 20160512      fever        1
#3     123         789 20160515   coughing        1
#4     546          13 20160226        flu        0
#5     564         444 20160707 laceration        0
#6     789         226 20160707     asthma        1
#7     789         226 20160707      fever        1

与dplyr

类似的解决方案

library(dplyr)
df %>%
 group_by(Enc_Key) %>%
 mutate(Diag_Ind = as.numeric(any(grep("asthma", Diag))))

#   Enc_Key Patient_Key Enc_Date    Diag    Diag_Ind
#    (int)       (int)    (int)     (fctr)    (dbl)
#1     123         789 20160512     asthma        1
#2     123         789 20160512      fever        1
#3     123         789 20160515   coughing        1
#4     546          13 20160226        flu        0
#5     564         444 20160707 laceration        0
#6     789         226 20160707     asthma        1
#7     789         226 20160707      fever        1

基于R中另一个字段中字符的同一键的多个记录的二进制变量

Binary variable to multiple records of same key based on characters in another field in R

binary

records

r

indicator