搜索变量范围,确定感兴趣的疾病和 return 最早的疾病诊断日期 (R)
Search through range of variables, identify diseases of interest and return earliest date of disease diagnosis (R)
我在 R 中有一个 df,其中包含多个列,描述了给定个体在研究期间被分配的 icd10 诊断,这些诊断的日期也记录在单独的变量中:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
Disease_code_2 = c('A071','','G20','','H250'),
Disease_code_3 = c('H250', '','','',''),
Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
Date_of_diagnosis_3 = c('17/09/2010','','','',''))
ID Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001 I802 A071 H250 12/06/1997 12/06/1998 17/09/2010
2 1002 G200 13/06/1997
3 1003 I802 G20 14/02/2003 18/09/2001
4 1004
5 1005 H356 H250 18/20/2005 12/07/1993
我想搜索 Disease_code_* 变量和 return 如果一个人被分配了任何感兴趣的疾病代码,则为 1,如 codes_of_interest = c("H250", "H356")
中指定的那样, 除了 任何感兴趣的代码被记录的最早日期。理想情况下,我的 df 看起来像:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
Disease_of_interest = c('1','0','0','0','1'),
Date_of_disease_interest = c('17/09/2010','','','','12/07/1993'),
Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
Disease_code_2 = c('A071','','G20','','H250'),
Disease_code_3 = c('H250', '','','',''),
Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
Date_of_diagnosis_3 = c('17/09/2010','','','',''))
ID Disease_of_interest Date_of_disease_interest Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001 1 17/09/2010 I802 A071 H250 12/06/1997 12/06/1998 17/09/2010
2 1002 0 G200 13/06/1997
3 1003 0 I802 G20 14/02/2003 18/09/2001
4 1004 0
5 1005 1 12/07/1993 H356 H250 18/20/2005 12/07/1993
我目前用来识别感兴趣的疾病代码的代码是(尽管这对诊断日期不敏感):
dfs$Disease_of_interest<- apply(df[, -1], 1, function(x) {
if(any(x %in% codes_of_interest))) {
return(1)
} else {
return(0)
}
})
如有任何建议,我们将不胜感激!
您可以在 apply
中使用 %in%
来获取您找到 codes_of_interest 的位置,然后在 mapply
中使用] 获取 min
个 日期 。如果找到一个日期codes_of_interest,如果没有找到NA
i <- apply(df[,2:4], 1, "%in%", codes_of_interest)
mapply(function(x, i) if(any(i)) min(x[i]) else NA, asplit(df[,5:7], 1), asplit(i, 2))
#[1] "17/09/2010" NA NA NA "12/07/1993"
我在 R 中有一个 df,其中包含多个列,描述了给定个体在研究期间被分配的 icd10 诊断,这些诊断的日期也记录在单独的变量中:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
Disease_code_2 = c('A071','','G20','','H250'),
Disease_code_3 = c('H250', '','','',''),
Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
Date_of_diagnosis_3 = c('17/09/2010','','','',''))
ID Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001 I802 A071 H250 12/06/1997 12/06/1998 17/09/2010
2 1002 G200 13/06/1997
3 1003 I802 G20 14/02/2003 18/09/2001
4 1004
5 1005 H356 H250 18/20/2005 12/07/1993
我想搜索 Disease_code_* 变量和 return 如果一个人被分配了任何感兴趣的疾病代码,则为 1,如 codes_of_interest = c("H250", "H356")
中指定的那样, 除了 任何感兴趣的代码被记录的最早日期。理想情况下,我的 df 看起来像:
df = data.frame(ID = c(1001, 1002, 1003, 1004, 1005),
Disease_of_interest = c('1','0','0','0','1'),
Date_of_disease_interest = c('17/09/2010','','','','12/07/1993'),
Disease_code_1 = c('I802', 'G200','I802','', 'H356'),
Disease_code_2 = c('A071','','G20','','H250'),
Disease_code_3 = c('H250', '','','',''),
Date_of_diagnosis_1 = c('12/06/1997','13/06/1997','14/02/2003','','18/20/2005'),
Date_of_diagnosis_2 = c('12/06/1998','','18/09/2001','','12/07/1993'),
Date_of_diagnosis_3 = c('17/09/2010','','','',''))
ID Disease_of_interest Date_of_disease_interest Disease_code_1 Disease_code_2 Disease_code_3 Date_of_disease_1 Date_of_disease_2 Date_of_disease_3
1 1001 1 17/09/2010 I802 A071 H250 12/06/1997 12/06/1998 17/09/2010
2 1002 0 G200 13/06/1997
3 1003 0 I802 G20 14/02/2003 18/09/2001
4 1004 0
5 1005 1 12/07/1993 H356 H250 18/20/2005 12/07/1993
我目前用来识别感兴趣的疾病代码的代码是(尽管这对诊断日期不敏感):
dfs$Disease_of_interest<- apply(df[, -1], 1, function(x) {
if(any(x %in% codes_of_interest))) {
return(1)
} else {
return(0)
}
})
如有任何建议,我们将不胜感激!
您可以在 apply
中使用 %in%
来获取您找到 codes_of_interest 的位置,然后在 mapply
中使用] 获取 min
个 日期 。如果找到一个日期codes_of_interest,如果没有找到NA
i <- apply(df[,2:4], 1, "%in%", codes_of_interest)
mapply(function(x, i) if(any(i)) min(x[i]) else NA, asplit(df[,5:7], 1), asplit(i, 2))
#[1] "17/09/2010" NA NA NA "12/07/1993"