识别触发事件后的首次发生

Identifying first occurrence after trigger event

我有一个大面板数据集,看起来有点像这样:

data have;
   input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;

对于每个 ID,我想记录所有 'trigger' 事件,即当 a=1 时,然后我需要记录 next 发生 b 需要多长时间=1。最终输出应该给我以下内容:

data want;
  input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;

获取所有 a=1 和 b=1 事件当然没问题,但由于它是一个非常大的数据集,每个 ID 都有很多这两个事件,所以我正在寻找一个优雅而直接的解决方案。有什么想法吗?

一种优雅的DATA步方法可以使用嵌套的DOW循环。当你理解 DOW 循环时,它是直截了当的。

data want(keep=id--diff);
  length id a_no a_t b_t diff 8;
  do until (last.id);                           * process each group;
    do a_no = 1 by 1 until(last.id);            * counter for each output;
      do until ( output_condition or end);      * process each triggering state change;

        SET have end=end;          * read data;
        by id;                     * setup first. last. variables for group;

        if a=1 then a_t = t;       * detect and record start of trigger state;

        output_condition = (b=1 and t > a_t > 0);  * evaluate for proper end of trigger state;
      end;

      if output_condition then do; 
        b_t = t;                     * compute remaining info at output point;
        diff = b_t - a_t;

        OUTPUT;

        a_t = .;       * reset trigger state tracking variables;
        b_t = .;
      end;
      else 
        OUTPUT;        * end of data reached without triggered output;
    end;
  end;
run;

注意:SQL 方式(未显示)可以在组内使用自加入。

这是一个相当简单的 SQL 方法,可以或多或少地提供所需的输出:

proc sql;
create table want
  as select 
    t1.id, 
    t1.t as a_t, 
    t2.t as b_t, 
    t2.t - t1.t as diff
    from 
      have(where = (a=1)) t1 
      left join 
      have(where = (b=1)) t2
    on 
      t1.id = t2.id 
      and t2.t > t1.t
    group by t1.id, t1.t
    having diff = min(diff)
    ;
quit;

唯一缺少的部分是 a_no。要在 SQL 中一致地生成这种行递增 ID 需要大量工作,但如果有一个额外的数据步骤就很简单了:

data want;
 set want;
 by id;
 if first.id then a_no = 0;
 a_no + 1;
run;