删除SAS中两个值之间的行

Question

对于以下数据，我尝试根据这些条件过滤每个组 ID 的行：

在 type='B' and value='Y' 的每一行之后执行以下操作
- 删除行，直到下一行具有 type='F' and value='Y'。
如果没有B='Y则全部保留(例如id=002)

我们可以创建我想要的数据集中显示的标志变量吗？这样我就可以过滤 Flag='Y'?

有

  ID     Type     Date         Value
  001     F       1/2/2018      Y
  001     B       1/3/2018
  001     B       1/4/2018      Y
  001     B       1/5/2018
  001     B       1/6/2018
  001     F       1/6/2018      Y
  001     B       1/6/2018      
  001     B       1/7/2018
  001     B       1/8/2018      Y
  001     B       1/8/2018
  001     B       1/9/2018
  002     F       1/2/2018      Y
  002     B       1/3/2018
  002     B       1/4/2018

想要

  ID     Type     Date         Value   Flag
  001     F       1/2/2018      Y       Y
  001     B       1/3/2018              Y
  001     B       1/4/2018      Y       Y 
  001     B       1/5/2018
  001     B       1/6/2018
  001     F       1/6/2018      Y       Y
  001     B       1/6/2018              Y
  001     B       1/7/2018              Y
  001     B       1/8/2018      Y       Y 
  001     B       1/8/2018
  001     B       1/9/2018
  002     F       1/2/2018      Y       Y
  002     B       1/3/2018              Y
  002     B       1/4/2018              Y

我尝试执行以下操作

data F;
set have;
where Type='F';run;

data B;
 set have;
 where Type='B';run;

 proc sql;
  create table all as select
  a.* from B as b
  inner join F as f
  on a.id=b.id
  and b.date >= a.date;
quit;

这包括我的数据集中的所有行。非常感谢任何帮助。

Answer 1

和理查德一样，不太明白过滤条件是什么

我发现您的加入有一个问题。您在 select 语句中使用了 a.*，但 "b" 和 "f" 作为您的数据集别名。这不会起作用，因为没有数据集被分配给别名 "a"。

正确的做法如下：

proc sql;
  create table all as 
  select b.* from B as b
  inner join F as f
  on b.id=f.id
  and b.date >= f.date;
quit;

但是，即便如此，我也不认为内部联接是解决您问题的正确方法。请告诉我们您的过滤条件？

Answer 2

我有一个解决方案，但它不是最优雅的（并且可能无法涵盖极端情况。）如果其他人有更好的解决方案，请分享。

首先，创建数据集以防其他人想尝试：

Data work.have;
  input @01 ID 3.
        @05 Type .
        @07 Date date7.
        @18 Value .;
  format ID 3.
         Type .
         Date date11.
         Value .; 
  datalines;
001 F '02Jan18'n Y
001 B '03Jan18'n  
001 B '04Jan18'n Y
001 B '05Jan18'n  
001 B '06Jan18'n 
001 F '06Jan18'n Y
001 B '06Jan18'n 
001 B '07Jan18'n 
001 B '08Jan18'n Y
001 B '08Jan18'n 
001 B '09Jan18'n 
002 F '02Jan18'n Y
002 B '03Jan18'n 
002 B '04Jan18'n 
;
run;

解决方法：我根据您编辑的创建标志变量的建议。

Data Flag;
  set work.have;

  if Type = 'B' and Value = 'Y' then
    flag + 1;
  if Type = 'F' then
    flag = 0;

  if Value ne 'Y' and flag = 1 then delete;
run;

标志变量默认为0。

第一个 IF-Then 条件标识 Type B ='Y' 行并将它们标记为 1，并为后续行保留此标记。

第二个 IF-Then 条件标识 type='F' 行并将标志重置为 0

最后一个 If-Then 条件删除所有 Flag=1 的行，除了第一个出现的 Type B ='Y' 行。

我希望这适用于您的问题。

Answer 3

计算组 ID 内连续子组（称为 'run' 行）的行状态的标准相对简单，但受损状态可能如果发生一些有趣的数据案例，则发生或被指示：

F Y 前有两个或更多 B Y（额外 'run ending'）
一个B Y前有两个或多个F Y（'run starting'在一个运行内）
组中第一行不是 F Y（'run starting' 不是组中第一行）

data want(drop=run_:);
  SET have;
  BY id;

  run_first = (type='F' and value='Y');
  run_final = (type='B' and value='Y');

  * set flag state at criteria for start of contiguous sub-group criteria;
  run_flag + run_first;

  if first.id and NOT run_flag then
    put 'WARNING: first row in group ' id= ' is not F Y, this may be incorrect';

  if run_flag > 1 and run_first then 
    put 'WARNING: an additional F Y before a B Y at row ' _n_;

  if run_flag then
    OUTPUT;

  if run_flag = 0 and run_final then 
    put 'WARNING: an additional B Y before a F Y at row ' _n_;

  * reset flag at criteria for contiguous sub-group;
  if last.id or run_final then 
    run_flag = 0;
run;

删除SAS中两个值之间的行

Removing rows between two values in SAS

sas

retain

proc-sql

delete-row