SAS:确定值是否在按两个变量分组的值列表中

SAS: Determine if value is in list of values grouped by two variables

我正在寻找可以按 ID1 和 ID2 对数据集进行分组的工具,然后查看每个组的 Start1 中是否存在 Start2。我试过使用 proc sql,但我无法使分组正常工作。请参阅下面的示例:

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   1   |  01-05-2015   |  01-10-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output. 
 1  |   1   |  01-10-2015   |  01-05-2015  | Both values of Start2 are in Start1 for ID1 = 1 and ID2 = 1 and should not output.  
 1  |   2   |  01-03-2017   |  01-09-2017  | Should output
 1  |   2   |  01-03-2017   |  01-06-2017  | Should output
 1  |   2   |  01-03-2017   |  01-03-2017  | 
 2  |   1   |  01-05-2015   |  01-10-2016  |
 2  |   1   |  01-05-2015   |  01-05-2015  |
 2  |   1   |  01-05-2015   |  01-07-2015  | Edited: Should output
 2  |   1   |  01-10-2016   |  01-10-2016  |
 2  |   1   |  01-10-2016   |  01-05-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  | Edited: Should output

So an output looking like: 

ID1 |  ID2  |    Start1     |    Start2    |
 1  |   2   |  01-03-2017   |  01-09-2017  |
 1  |   2   |  01-03-2017   |  01-06-2017  |
 2  |   1   |  01-05-2015   |  01-07-2015  |
 2  |   1   |  01-10-2016   |  01-07-2015  |

任何帮助将不胜感激!我目前的尝试看起来像这样(受下面的 Egor 启发):

proc sql;
   create table want as 
   select *
   from have
   group by ID1, ID2
   having not Start2 in (
        select Start1
        from have
        group by ID1, ID2)
;

这似乎也提供了所需的输出,但是,当我在我的全尺寸数据集上对其进行测试时,我发现它缺少某些应该输出的行。

示例数据:

data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
 1  |   1   |  01-05-2015   |  01-10-2015  
 1  |   1   |  01-10-2015   |  01-05-2015  
 1  |   2   |  01-03-2017   |  01-09-2017  
 1  |   2   |  01-03-2017   |  01-06-2017  
 1  |   2   |  01-03-2017   |  01-03-2017  
 2  |   1   |  01-05-2015   |  01-10-2016  
 2  |   1   |  01-05-2015   |  01-05-2015  
 2  |   1   |  01-05-2015   |  01-07-2015  
 2  |   1   |  01-10-2016   |  01-10-2016  
 2  |   1   |  01-10-2016   |  01-05-2015  
 2  |   1   |  01-10-2016   |  01-07-2015  
 ;
 run;

尝试 2

输入数据:

data a;
infile cards dlm="|";
format Start1 Start2 DDMMYY10.;
input id1:Best. id2:Best. Start1:DDMMYY10. Start2:DDMMYY10.;
cards;
 1  |   1   |  01-05-2015   |  01-10-2015  
 1  |   1   |  01-10-2015   |  01-05-2015  
 1  |   2   |  01-03-2017   |  01-09-2017  
 1  |   2   |  01-03-2017   |  01-06-2017  
 1  |   2   |  01-03-2017   |  01-03-2017  
 2  |   1   |  01-05-2015   |  01-10-2016  
 2  |   1   |  01-05-2015   |  01-05-2015  
 2  |   1   |  01-05-2015   |  01-07-2015  
 2  |   1   |  01-10-2016   |  01-10-2016  
 2  |   1   |  01-10-2016   |  01-05-2015  
 2  |   1   |  01-10-2016   |  01-07-2015  
 ;
 run;

解决方案:

proc sql;
   create table c as 
   select distinct id1, id2, Start1
   from a
;
 proc sql;
   create table b as 
   select id1, id2, Start1, Start2
   from a
   where Start2 not in (select Start1 from c where a.id1=c.id1 and a.id2=c.id2)
;

结果:

1   2   01/03/2017  01/09/2017
1   2   01/03/2017  01/06/2017
2   1   01/05/2015  01/07/2015
2   1   01/10/2016  01/07/2015