隔离具有 2 个诊断但诊断数据在不同行上的患者
Isolate Patients with 2 diagnoses but diagnosis data is on different lines
我有一个患者数据集,每个诊断都在不同的行上。
这是它的外观示例:
patientID diabetes cancer age gender
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
我需要隔离同时诊断为糖尿病和癌症的患者;他们唯一的患者标识符是 patientID。有时他们都在同一条线上,有时他们不在同一条线上。我不确定如何执行此操作,因为信息位于多行。
我该怎么做?
这是我目前拥有的:
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(DOB) as DOB
from diab_dx
group by patientID;
quit;
data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;
run;
proc freq data=final;
tables both;
run;
这是正确的吗?
如果您想了解数据步骤查找的工作原理。
data pat;
input patientID diabetes cancer age gender:.;
cards;
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
;;;;
run;
data both;
do until(last.patientid);
set pat; by patientid;
_diabetes = max(diabetes,_diabetes);
_cancer = max(cancer,_cancer);
end;
both = _diabetes and _cancer;
run;
proc print;
run;
在 sql 查询的末尾添加一个 having 语句。
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(age) as DOB
from PAT
group by patientID
having calculated diabetes ge 1 and calculated cancer ge 1;
quit;
您可能会发现一些编码器,尤其是那些来自统计背景的编码器,更有可能使用 Proc MEANS
而不是 SQL 或 DATA 步来计算诊断标志最大值。
proc means noprint data=have;
by patientID;
output out=want
max(diabetes) = diabetes
max(cancer) = cancer
min(age) = age
;
run;
或者对于聚合函数都相同的情况[=14=]
proc means noprint data=have;
by patientID;
var diabetes cancer;
output out=want max= ;
run;
或
proc means noprint data=have;
by patientID;
var diabetes cancer age;
output out=want max= / autoname;
run;
我有一个患者数据集,每个诊断都在不同的行上。
这是它的外观示例:
patientID diabetes cancer age gender
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
我需要隔离同时诊断为糖尿病和癌症的患者;他们唯一的患者标识符是 patientID。有时他们都在同一条线上,有时他们不在同一条线上。我不确定如何执行此操作,因为信息位于多行。
我该怎么做?
这是我目前拥有的:
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(DOB) as DOB
from diab_dx
group by patientID;
quit;
data final; set want;
if diabetes GE 1 AND cancer GE 1 THEN both = 1;
else both =0;
run;
proc freq data=final;
tables both;
run;
这是正确的吗?
如果您想了解数据步骤查找的工作原理。
data pat;
input patientID diabetes cancer age gender:.;
cards;
1 1 0 65 M
1 0 1 65 M
2 1 1 23 M
2 0 0 23 M
3 0 0 50 F
3 0 0 50 F
;;;;
run;
data both;
do until(last.patientid);
set pat; by patientid;
_diabetes = max(diabetes,_diabetes);
_cancer = max(cancer,_cancer);
end;
both = _diabetes and _cancer;
run;
proc print;
run;
在 sql 查询的末尾添加一个 having 语句。
PROC SQL;
create table want as
select patientID
, max(diabetes) as diabetes
, max(cancer) as cancer
, min(age) as DOB
from PAT
group by patientID
having calculated diabetes ge 1 and calculated cancer ge 1;
quit;
您可能会发现一些编码器,尤其是那些来自统计背景的编码器,更有可能使用 Proc MEANS
而不是 SQL 或 DATA 步来计算诊断标志最大值。
proc means noprint data=have;
by patientID;
output out=want
max(diabetes) = diabetes
max(cancer) = cancer
min(age) = age
;
run;
或者对于聚合函数都相同的情况[=14=]
proc means noprint data=have;
by patientID;
var diabetes cancer;
output out=want max= ;
run;
或
proc means noprint data=have;
by patientID;
var diabetes cancer age;
output out=want max= / autoname;
run;