如何将特定条件应用于 SAS Enterprise Guide 中的每个子集的数据集中的 select 行
How to select rows from a dataset with specific criteria applied to each subset in SAS Enterprise Guide
我有以下 table:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM1 | 1 | DE | DELETED
COM2 | 1 | DE | REMAINING
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
COM8 | 3 | ES | REMAINING
COM9 | 3 | ES | DELETED
我需要得到:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
因此,我需要所有状态为 DELETED 的条目,并且在每个 GROUP 中没有 COMPANY_NAME 与 DELETED 状态在同一国家/地区具有 REMAINING 状态。我可以使用 PROC SQL 或 DATA 步骤。
目前我尝试过的是:
PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;
但这显然排除了所有组中的所有剩余国家。
我也尝试了一个数据步骤:
DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;
IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE;
RUN;
但是语法不正确,因为我不知道正确的写法。
试试这个:
proc sql;
select * from your_table
where status = 'deleted' and
catx("_",country,group) not in
(select catx("_",country,group) from your_table where status='remaining');
quit;
输出:
company_name | group | country | status
com3 | 1 | UK | deleted
com4 | 2 | ES | deleted
com5 | 2 | FR | deleted
com6 | 3 | RO | deleted
com7 | 3 | BG | deleted
您的解决方案表明您的思路是正确的。
一个数据步骤的解决方案是:
data want(drop = remain_list);
length remain_list $ 20;
do until(last.group);
set have;
by group;
if status = 'REMAINING' and not find(remain_list, country) then
remain_list = catx(' ', remain_list, country);
end;
do until(last.group);
set have;
by group;
if status = 'DELETED' and not find(remain_list, strip(country)) then
output;
end;
run;
我有以下 table:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM1 | 1 | DE | DELETED
COM2 | 1 | DE | REMAINING
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
COM8 | 3 | ES | REMAINING
COM9 | 3 | ES | DELETED
我需要得到:
COMPANY_NAME | GROUP | COUNTRY | STATUS
COM3 | 1 | UK | DELETED
COM4 | 2 | ES | DELETED
COM5 | 2 | FR | DELETED
COM6 | 3 | RO | DELETED
COM7 | 3 | BG | DELETED
因此,我需要所有状态为 DELETED 的条目,并且在每个 GROUP 中没有 COMPANY_NAME 与 DELETED 状态在同一国家/地区具有 REMAINING 状态。我可以使用 PROC SQL 或 DATA 步骤。
目前我尝试过的是:
PROC SQL;
CREATE TABLE WORK.OUTPUT AS
SELECT *
FROM WORK.INPUT
WHERE STATUS = 'DELETED' AND COUNTRY NOT IN (SELECT COUNTRY FROM WORK.INPUT WHERE STATUS = 'REMAINING');
QUIT;
但这显然排除了所有组中的所有剩余国家。
我也尝试了一个数据步骤:
DATA WORK.OUTPUT;
SET WORK.INPUT;
BY GROUP COUNTRY;
IF NOT (STATUS = 'DELETED' AND COUNTRY NOT IN (COUNTRY WHERE STATUS = 'REMAINING')) THEN DELETE;
RUN;
但是语法不正确,因为我不知道正确的写法。
试试这个:
proc sql;
select * from your_table
where status = 'deleted' and
catx("_",country,group) not in
(select catx("_",country,group) from your_table where status='remaining');
quit;
输出:
company_name | group | country | status
com3 | 1 | UK | deleted
com4 | 2 | ES | deleted
com5 | 2 | FR | deleted
com6 | 3 | RO | deleted
com7 | 3 | BG | deleted
您的解决方案表明您的思路是正确的。
一个数据步骤的解决方案是:
data want(drop = remain_list);
length remain_list $ 20;
do until(last.group);
set have;
by group;
if status = 'REMAINING' and not find(remain_list, country) then
remain_list = catx(' ', remain_list, country);
end;
do until(last.group);
set have;
by group;
if status = 'DELETED' and not find(remain_list, strip(country)) then
output;
end;
run;