在 SAS 中按变量范围删除观察值
Delete observations by a range in a variable in SAS
我有一个数据集,我想根据每个 ID 的月数对其进行子集化,其中一个 ID 有多个观察值。我只想保留唯一 ID 在可变月份中范围为 1 - 7 的观察结果。例如,在下面的 table 中,我想保留的唯一 ID 是 2 和 4。我的原始数据中有多个 ID 不是 2 或 4,所以我不能只输入 '''if is not ((ID = 2) or (ID = 4)) then delete'''。
我试过使用 proc iml、滞后函数和嵌套的 if then 语句,但我似乎无法按照我想要的方式使我的代码达到 运行。如果我忽略了更简单的方法,请指出正确的方向。
ID
Month
1
1
1
2
1
3
2
1
2
2
2
3
2
4
2
5
2
6
2
7
3
1
3
2
4
1
4
2
4
3
4
4
4
5
4
6
4
7
以下是我试过的一些代码:
data work.want;
set work.have;
if first.month < 7
then do;
if id = lag(id)
then delete;
end;
by descending id;
run;
proc iml;
use work.have;
list all where ((1 <= month <= 7) & (account_id = lag(account_id)));
close work.have;
data work.want;
set work.have;
if (first.month < 7) and (account_id = lag(account_id))
then month_total = month_total+lag(month);
by descending id;
run;
可能有一些方法可以简化此操作,但根据您发布的内容,这可能是您理解和修改的最简单的解决方案。
data have;
infile cards dlm='09'x;
input ID Month;
cards;
1 1
1 2
1 3
2 1
2 2
2 3
2 4
2 5
2 6
2 7
3 1
3 2
4 1
4 2
4 3
4 4
4 5
4 6
4 7
;;;;;
run;
data IDS_keep;
set have;
by ID;
retain flag 0;
*for each new id, reset counters and flags;
if first.id then do; counter=0; flag=0; end;
*increment counter and check it matches month value;
counter+1;
*if not, set flag to 1;
if counter ne month then flag=1;
*if last month and last id and flag not flipped then ID is complete;
if month=7 and last.id and flag=0 then output;
*commented out, but uncomment for more efficient processing;
*keep id;
run;
proc sql;
create table want as
select * from have
where id in (select ID from IDS_keep);
quit;
我有一个数据集,我想根据每个 ID 的月数对其进行子集化,其中一个 ID 有多个观察值。我只想保留唯一 ID 在可变月份中范围为 1 - 7 的观察结果。例如,在下面的 table 中,我想保留的唯一 ID 是 2 和 4。我的原始数据中有多个 ID 不是 2 或 4,所以我不能只输入 '''if is not ((ID = 2) or (ID = 4)) then delete'''。 我试过使用 proc iml、滞后函数和嵌套的 if then 语句,但我似乎无法按照我想要的方式使我的代码达到 运行。如果我忽略了更简单的方法,请指出正确的方向。
ID | Month |
---|---|
1 | 1 |
1 | 2 |
1 | 3 |
2 | 1 |
2 | 2 |
2 | 3 |
2 | 4 |
2 | 5 |
2 | 6 |
2 | 7 |
3 | 1 |
3 | 2 |
4 | 1 |
4 | 2 |
4 | 3 |
4 | 4 |
4 | 5 |
4 | 6 |
4 | 7 |
以下是我试过的一些代码:
data work.want;
set work.have;
if first.month < 7
then do;
if id = lag(id)
then delete;
end;
by descending id;
run;
proc iml;
use work.have;
list all where ((1 <= month <= 7) & (account_id = lag(account_id)));
close work.have;
data work.want;
set work.have;
if (first.month < 7) and (account_id = lag(account_id))
then month_total = month_total+lag(month);
by descending id;
run;
可能有一些方法可以简化此操作,但根据您发布的内容,这可能是您理解和修改的最简单的解决方案。
data have;
infile cards dlm='09'x;
input ID Month;
cards;
1 1
1 2
1 3
2 1
2 2
2 3
2 4
2 5
2 6
2 7
3 1
3 2
4 1
4 2
4 3
4 4
4 5
4 6
4 7
;;;;;
run;
data IDS_keep;
set have;
by ID;
retain flag 0;
*for each new id, reset counters and flags;
if first.id then do; counter=0; flag=0; end;
*increment counter and check it matches month value;
counter+1;
*if not, set flag to 1;
if counter ne month then flag=1;
*if last month and last id and flag not flipped then ID is complete;
if month=7 and last.id and flag=0 then output;
*commented out, but uncomment for more efficient processing;
*keep id;
run;
proc sql;
create table want as
select * from have
where id in (select ID from IDS_keep);
quit;