SAS计算变量取特定值时的持续时间

SAS calculation of duration when a variable takes on a specific value

我有一个数据集,我想在其中计算 SAS 中每个 id 的多个福利咒语的持续时间。

Start 由变量 y_xxxx 定义,取值 'welfare' 其中前 4 y_xxxx 不等于 'welfare' 对于每个 id。

End 由变量 y_xxxx 定义,取值 'welfare' 其中以下 4 y_xxxx 不等于 'welfare' .如果以下 4 y_xxxx 的值为 'other',则必须删除此拼写而不是整个观察。

持续时间 = 结束-开始+1

每个id可以拥有多个‘福利’法术,满足以上限制。数据看起来是这样的(除了变量y_xxxx在真实数据集中被记录到y_1548)。

ID  y_0950   y_0951   y_0952   y_0953   y_1001    y_1002    y_1003   ...  y_1015
01  other    other    other    other    welfare   welfare   welfare  ... 
02  welfare  welfare  welfare  other    other     other     other    ...
03  
04
...
N  other   other     other    other    welfare   welfare   welfare  ...  

我可以计算第一个咒语的持续时间,请参见下面的代码,但我无法弄清楚如何在不一遍又一遍地重复相同代码的情况下为每个 id 继续下一个咒语。

%let uger=y_0950--y_1015;
%let welfare='welfare';
%let other='other';

/*Start welfare spell*/
data mydata;
set data;
array y(*) &uger;
do j=5 to 19 until (start);
if  y(j-1) ne &welfare and  
y(j-2) ne &welfare and  
y(j-3) ne &welfare and  
y(j-4) ne &welfare and
y(j) eq &welfare 
then start=j;
end;
if start>0 then output;
run;

/*end welfare spell*/
data mydata1;
set mydata;
array y(*) &uger;
do j=start to 19 until(ends);
if y(j) ne &welfare and
y(j+1) ne &welfare and
y(j+2) ne &welfare and
y(j+3) ne &welfare
then ends=j-1;
end;
/*other*/
do k=start to 19 until(other);
if y(k) eq &other and
y(k+1) eq &other and
y(k+2) eq &other and
y(k+3) eq &other
then other=k-1;
end;
if ends=. then censor=1;
if ends=. then ends=19;
if other >0 then delete;
duration= ends-start+1;
run; 

我希望得到如下数据(与上面的数据示例不对应)

ID  start  end  duration  censor   
01  5      10   6         0         
01  15     19   5         1  
02  6      12   7         0
03  ..
04  ..
04  ..
..
N  

假设 censor 表示最后观察到的周期是 "welfare",这应该可以解决您的问题。

这个问题有多种解决方案,但这里的关键是当您在值 "welfare" 之后达到值 "other" 时使用 output,并重置output 之后缺少其他变量(也可能为 0),因此您可以重新开始。

另一件需要注意的事情是向量中最后一个元素的特殊性。在下面的代码中查看我的评论。

创建示例数据

data welfare;
  input ID 
        y_0950 $ 
        y_0951 $
        y_0952 $
        y_0953 $
        y_1001 $
        y_1002 $
        y_1003 $;
  datalines;
01 other welfare welfare other other welfare welfare welfare
02 welfare welfare welfare other other other other
03 welfare welfare other other welfare welfare welfare
04 welfare other welfare welfare other welfare other
run;

计算福利期

/* add y_: to the drop if you don't need them */
data welfare_periods (drop=i); 
  format ID start end duration censor 8.;
  set welfare;
  array y(*) y_:;    
  censor = 0;
  start = .;
  end = .;
  duration = .;

  /* Loop over every column y_... */
  do i = 1 to (dim(y)-1);
    if y(i) = "welfare" then do;
      /* Check if we have a new start */
      if duration = . then do;
        start = i;
        end = i; /* We'll increment this later if need be. */
        duration = 1; /* We'll increment this later if need be. */
      end;
      else if y(i-1) = "welfare" then do;
        end = end + 1;
        duration = duration + 1;
      end;
    end;

    /* When y(i) = "other" */
    else if y(i) = "other" then do;
      if duration > 0 then do;
        /* Output the row and reset the start/end/duration variables */
        output;
        start = .;
        end = .;
        duration = .;
      end;
    end;

    /* Decide what to do when we reach last element minus 1 */
    /* We know here that y(i) = "welfare", since the last */
    /* "else if" block took care of y(i)'s equal to "other" */
    if i = dim(y) - 1 and duration > 0 then do;
      if y(i+1) = "welfare" then do;
        end = end + 1;
        duration = duration + 1;
        censor = 1;
        output;
      end;
      else if y(i+1) = "other" then output;
    end;
  end;
run;

结果