使用SAS中滞后的前向后向方法填充缺失值

Question

假设您有一个 table，其中包含用户名、计数器和每个计数器的分数。

data have;
input user $  counter  score;
cards;
A 1 .
A 2 .
A 3 40
A 4 .
A 5 20
A 6 .
B 1 30
B 2 .
C 1 .
C 2 .
C 3 .
;
run;

一些计分器之间缺少一些分数，而您想输入与上一个计分器相同的分数。所以结果将如下所示：

我设法通过使用 lag 函数向前填充缺失的分数值，如下所示：

data result1a;
  set have(keep=user);
  by user;

  *Look ahead;
    merge have have(firstobs=2 keep=score rename=(score=_NextScore));

    if first.user then do;
        if score= . then score=_NextScore;
        end;
    else do;
        _PrevScore = lag(score);
        if score= . then score=_PrevScore;
    end;
    output;
run;

然后我通过在 counter 上使用 descending 函数向后排序 table，如下所示：

proc sort data = result1a out= result1b; 
by user descending counter ;
run;

然后最后我将再次使用 lag 函数在 raaranged table 中向前填充缺失值（根据初始 table 向后移动），如下所示。

我在do-loop中使用了lag函数，因为我想在每一步都更新之前的值（例如，值40会从第一个分数进到最后一个分数一直在群里）。

但是，我得到了奇怪的结果。所有缺失值都没有真正的价值。关于修复最后一个数据步骤有什么想法吗？

data result1c;
set result1b;
by user;

   if first.user then do;
        if score= . then score=_NextScore;
        else score = score;

        end;
   else do;
        _PrevScore = lag(score);
        if score= . then 
        score=_PrevScore;
        else score = score;
   end;
   output;
run;

Answer 1

lag() 是一个经常被误解的函数。顾名思义，当您调用它时，SAS 会回顾前一行并获取值，但事实并非如此。

实际上，lag<n>() 是一个创建具有 n 个值的 "queue" 的函数。当您调用 lag<n>(x) 时，它会将 x 的当前值推入该队列并从中读取先前的值（当然每行只推一次）。因此，如果您在某个条件内有 lag<n>()，则只有在满足该条件时才会进行推送。

要解决您的问题，您需要对每一行使用 lag() 函数运行，并在更正分数后运行：

data result1c;
set result1b;
by user;
if first.user then do;
    if score= . then score=_NextScore;
    else score = score;
end;
else do;
    if score= . then 
    score=_PrevScore;
    else score = score;
end;
_PrevScore = lag(score);
output;
run;

编辑：我对滞后的误用很着迷，没有提出可行的替代方案。因为您正在修改分数，所以使用延迟根本不是一个好主意。保留将在这里工作：

data result1c;
set result1b;
by user;
retain _PrevScore;
if first.user then do;
    if score= . then score=_NextScore;
    else score = score;
end;
else do;
    if score= . then 
    score=_PrevScore;
    else score = score;
end;
_PrevScore = score;
output;
run;

Answer 2

不需要使用 lag，请使用 retain（或等效项）。这是一个双 DoW 循环解决方案，它在一个数据步中完成（并且，有效地，一次读取 - 它缓冲读取，因此这与单次读取一样有效）。

首先，我们遍历数据集以找到第一个分数，这样我们就可以获取初始 prev_score 值。然后设置它，并重新循环遍历该用户的行并输出。这里没有实际的 retain 因为我自己在做循环，但它类似于如果有 retain prev_score; 并且这是一个正常的数据步循环。我实际上没有 retain 它，因为我希望它在遇到新用户时消失。

data want;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(first_score) and not missing(score) then 
      first_score = score;

  end;
  prev_score = first_score;
  do _n_ = 1 by 1 until (last.user);
    set have;
    by user;
    if missing(score) then
      score = prev_score;
    prev_score = score;
    output;
  end;
run;

使用SAS中滞后的前向后向方法填充缺失值

Filling in missing values with forward-backward method with lag in SAS

sas

lag

missing-data