SAS:确定值是否在按两个变量分组的值列表中
SAS: Determine if value is in list of values grouped by two variables
我正在寻找可以按 ID1 和 ID2 对数据集进行分组的工具,然后查看每个组的 Start1 中是否存在 Start2。我试过使用 proc sql,但我无法使分组正常工作。请参阅下面的示例:
ID1 | ID2 | Start1 | Start2 |
1 | 1 | 01-05-2015 | 01-10-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 1 | 01-10-2015 | 01-05-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 2 | 01-03-2017 | 01-09-2017 | Should output
1 | 2 | 01-03-2017 | 01-06-2017 | Should output
1 | 2 | 01-03-2017 | 01-03-2017 |
2 | 1 | 01-05-2015 | 01-10-2016 |
2 | 1 | 01-05-2015 | 01-05-2015 |
2 | 1 | 01-05-2015 | 01-07-2015 | Edited: Should output
2 | 1 | 01-10-2016 | 01-10-2016 |
2 | 1 | 01-10-2016 | 01-05-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 | Edited: Should output
So an output looking like:
ID1 | ID2 | Start1 | Start2 |
1 | 2 | 01-03-2017 | 01-09-2017 |
1 | 2 | 01-03-2017 | 01-06-2017 |
2 | 1 | 01-05-2015 | 01-07-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 |
任何帮助将不胜感激!我目前的尝试看起来像这样(受下面的 Egor 启发):
proc sql;
create table want as
select *
from have
group by ID1, ID2
having not Start2 in (
select Start1
from have
group by ID1, ID2)
;
这似乎也提供了所需的输出,但是,当我在我的全尺寸数据集上对其进行测试时,我发现它缺少某些应该输出的行。
示例数据:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
尝试 2
输入数据:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
解决方案:
proc sql;
create table c as
select distinct id1, id2, Start1
from a
;
proc sql;
create table b as
select id1, id2, Start1, Start2
from a
where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
;
结果:
1 2 01/03/2017 01/09/2017
1 2 01/03/2017 01/06/2017
2 1 01/05/2015 01/07/2015
2 1 01/10/2016 01/07/2015
我正在寻找可以按 ID1 和 ID2 对数据集进行分组的工具,然后查看每个组的 Start1 中是否存在 Start2。我试过使用 proc sql,但我无法使分组正常工作。请参阅下面的示例:
ID1 | ID2 | Start1 | Start2 |
1 | 1 | 01-05-2015 | 01-10-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 1 | 01-10-2015 | 01-05-2015 | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.
1 | 2 | 01-03-2017 | 01-09-2017 | Should output
1 | 2 | 01-03-2017 | 01-06-2017 | Should output
1 | 2 | 01-03-2017 | 01-03-2017 |
2 | 1 | 01-05-2015 | 01-10-2016 |
2 | 1 | 01-05-2015 | 01-05-2015 |
2 | 1 | 01-05-2015 | 01-07-2015 | Edited: Should output
2 | 1 | 01-10-2016 | 01-10-2016 |
2 | 1 | 01-10-2016 | 01-05-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 | Edited: Should output
So an output looking like:
ID1 | ID2 | Start1 | Start2 |
1 | 2 | 01-03-2017 | 01-09-2017 |
1 | 2 | 01-03-2017 | 01-06-2017 |
2 | 1 | 01-05-2015 | 01-07-2015 |
2 | 1 | 01-10-2016 | 01-07-2015 |
任何帮助将不胜感激!我目前的尝试看起来像这样(受下面的 Egor 启发):
proc sql;
create table want as
select *
from have
group by ID1, ID2
having not Start2 in (
select Start1
from have
group by ID1, ID2)
;
这似乎也提供了所需的输出,但是,当我在我的全尺寸数据集上对其进行测试时,我发现它缺少某些应该输出的行。
示例数据:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
尝试 2
输入数据:
data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
1 | 1 | 01-05-2015 | 01-10-2015
1 | 1 | 01-10-2015 | 01-05-2015
1 | 2 | 01-03-2017 | 01-09-2017
1 | 2 | 01-03-2017 | 01-06-2017
1 | 2 | 01-03-2017 | 01-03-2017
2 | 1 | 01-05-2015 | 01-10-2016
2 | 1 | 01-05-2015 | 01-05-2015
2 | 1 | 01-05-2015 | 01-07-2015
2 | 1 | 01-10-2016 | 01-10-2016
2 | 1 | 01-10-2016 | 01-05-2015
2 | 1 | 01-10-2016 | 01-07-2015
;
run;
解决方案:
proc sql;
create table c as
select distinct id1, id2, Start1
from a
;
proc sql;
create table b as
select id1, id2, Start1, Start2
from a
where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
;
结果:
1 2 01/03/2017 01/09/2017
1 2 01/03/2017 01/06/2017
2 1 01/05/2015 01/07/2015
2 1 01/10/2016 01/07/2015