隔离具有 2 个诊断但诊断数据在不同行上的患者

Question

我有一个患者数据集，每个诊断都在不同的行上。

这是它的外观示例：

patientID diabetes cancer age gender 
1         1          0     65    M     
1         0          1     65    M     
2         1          1     23    M     
2         0          0     23    M     
3         0          0     50    F     
3         0          0     50    F

我需要隔离同时诊断为糖尿病和癌症的患者；他们唯一的患者标识符是 patientID。有时他们都在同一条线上，有时他们不在同一条线上。我不确定如何执行此操作，因为信息位于多行。

我该怎么做？

这是我目前拥有的：

PROC SQL;
create table want as
select patientID
       , max(diabetes) as diabetes
       , max(cancer) as cancer
       , min(DOB) as DOB
   from diab_dx

   group by patientID;
quit;

data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;

run;

proc freq data=final;
tables both;
run;

这是正确的吗？

Answer 1

如果您想了解数据步骤查找的工作原理。

data pat;
   input patientID diabetes cancer age gender:.; 
   cards;
1         1          0     65    M     
1         0          1     65    M     
2         1          1     23    M     
2         0          0     23    M     
3         0          0     50    F     
3         0          0     50    F
;;;;
   run;
data both;
   do until(last.patientid);
      set pat; by patientid;
      _diabetes = max(diabetes,_diabetes);
      _cancer   = max(cancer,_cancer);
      end;
   both = _diabetes and _cancer;
   run;
proc print;
   run;

Answer 2

在 sql 查询的末尾添加一个 having 语句。

  PROC SQL;
  create table want as
  select patientID
   , max(diabetes) as diabetes
   , max(cancer) as cancer
   , min(age) as DOB
  from PAT

  group by patientID
  having calculated diabetes ge 1 and calculated cancer ge 1;
 quit;

Answer 3

您可能会发现一些编码器，尤其是那些来自统计背景的编码器，更有可能使用 Proc MEANS 而不是 SQL 或 DATA 步来计算诊断标志最大值。

proc means noprint data=have;
  by patientID;
  output out=want 
    max(diabetes) = diabetes 
    max(cancer) = cancer
    min(age) = age
  ;
run;

或者对于聚合函数都相同的情况[=14=]

proc means noprint data=have;
  by patientID;
  var diabetes cancer;
  output out=want max=  ;
run;

或

proc means noprint data=have;
  by patientID;
  var diabetes cancer age; 
  output out=want max= / autoname;
run;

隔离具有 2 个诊断但诊断数据在不同行上的患者

Isolate Patients with 2 diagnoses but diagnosis data is on different lines

bioinformatics

sas