识别触发事件后的首次发生
Identifying first occurrence after trigger event
我有一个大面板数据集,看起来有点像这样:
data have;
input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;
对于每个 ID,我想记录所有 'trigger' 事件,即当 a=1 时,然后我需要记录 next 发生 b 需要多长时间=1。最终输出应该给我以下内容:
data want;
input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;
获取所有 a=1 和 b=1 事件当然没问题,但由于它是一个非常大的数据集,每个 ID 都有很多这两个事件,所以我正在寻找一个优雅而直接的解决方案。有什么想法吗?
一种优雅的DATA步方法可以使用嵌套的DOW循环。当你理解 DOW 循环时,它是直截了当的。
data want(keep=id--diff);
length id a_no a_t b_t diff 8;
do until (last.id); * process each group;
do a_no = 1 by 1 until(last.id); * counter for each output;
do until ( output_condition or end); * process each triggering state change;
SET have end=end; * read data;
by id; * setup first. last. variables for group;
if a=1 then a_t = t; * detect and record start of trigger state;
output_condition = (b=1 and t > a_t > 0); * evaluate for proper end of trigger state;
end;
if output_condition then do;
b_t = t; * compute remaining info at output point;
diff = b_t - a_t;
OUTPUT;
a_t = .; * reset trigger state tracking variables;
b_t = .;
end;
else
OUTPUT; * end of data reached without triggered output;
end;
end;
run;
注意:SQL 方式(未显示)可以在组内使用自加入。
这是一个相当简单的 SQL 方法,可以或多或少地提供所需的输出:
proc sql;
create table want
as select
t1.id,
t1.t as a_t,
t2.t as b_t,
t2.t - t1.t as diff
from
have(where = (a=1)) t1
left join
have(where = (b=1)) t2
on
t1.id = t2.id
and t2.t > t1.t
group by t1.id, t1.t
having diff = min(diff)
;
quit;
唯一缺少的部分是 a_no
。要在 SQL 中一致地生成这种行递增 ID 需要大量工作,但如果有一个额外的数据步骤就很简单了:
data want;
set want;
by id;
if first.id then a_no = 0;
a_no + 1;
run;
我有一个大面板数据集,看起来有点像这样:
data have;
input id t a b ;
datalines;
1 1 0 0
1 2 0 0
1 3 1 0
1 4 0 0
1 5 0 1
1 6 1 0
1 7 0 0
1 8 0 0
1 9 0 0
1 10 0 1
2 1 0 0
2 2 1 0
2 3 0 0
2 4 0 0
2 5 0 1
2 6 0 1
2 7 0 1
2 8 0 1
2 9 1 0
2 10 0 1
3 1 0 0
3 2 0 0
3 3 0 0
3 4 0 0
3 5 0 0
3 6 0 0
3 7 1 0
3 8 0 0
3 9 0 0
3 10 0 0
;
run;
对于每个 ID,我想记录所有 'trigger' 事件,即当 a=1 时,然后我需要记录 next 发生 b 需要多长时间=1。最终输出应该给我以下内容:
data want;
input id a_no a_t b_t diff ;
datalines;
1 1 3 5 2
1 2 6 10 4
2 1 2 5 3
2 2 9 10 1
3 1 7 . .
;
run;
获取所有 a=1 和 b=1 事件当然没问题,但由于它是一个非常大的数据集,每个 ID 都有很多这两个事件,所以我正在寻找一个优雅而直接的解决方案。有什么想法吗?
一种优雅的DATA步方法可以使用嵌套的DOW循环。当你理解 DOW 循环时,它是直截了当的。
data want(keep=id--diff);
length id a_no a_t b_t diff 8;
do until (last.id); * process each group;
do a_no = 1 by 1 until(last.id); * counter for each output;
do until ( output_condition or end); * process each triggering state change;
SET have end=end; * read data;
by id; * setup first. last. variables for group;
if a=1 then a_t = t; * detect and record start of trigger state;
output_condition = (b=1 and t > a_t > 0); * evaluate for proper end of trigger state;
end;
if output_condition then do;
b_t = t; * compute remaining info at output point;
diff = b_t - a_t;
OUTPUT;
a_t = .; * reset trigger state tracking variables;
b_t = .;
end;
else
OUTPUT; * end of data reached without triggered output;
end;
end;
run;
注意:SQL 方式(未显示)可以在组内使用自加入。
这是一个相当简单的 SQL 方法,可以或多或少地提供所需的输出:
proc sql;
create table want
as select
t1.id,
t1.t as a_t,
t2.t as b_t,
t2.t - t1.t as diff
from
have(where = (a=1)) t1
left join
have(where = (b=1)) t2
on
t1.id = t2.id
and t2.t > t1.t
group by t1.id, t1.t
having diff = min(diff)
;
quit;
唯一缺少的部分是 a_no
。要在 SQL 中一致地生成这种行递增 ID 需要大量工作,但如果有一个额外的数据步骤就很简单了:
data want;
set want;
by id;
if first.id then a_no = 0;
a_no + 1;
run;