查询上一行的数据并更新当前行

Query data from previous row and update current row

我有一个 table 缺少一些数据,所以我必须用前一天的数据替换丢失的数据。

我想做一个 SQL 更新来解决这个问题。

如果满足以下条件 -> 如果ID1在set(a1,a2,a3 ) AND 缺少类型

变量“金额”a/b会有一个荒谬的值

然后我们取前一天行的Amount a/b值,其中ID1和ID2与所在行相同符合标准

所以这里ID1和ID2分别等于a1和b1,我们在前一天(10/03/2021)查找a1和b1,得到金额28.45/29.46,用来替换虚假金额454848.25 /548926.36.

我们也复制类型值。

ID1 ID2 Amount a Amount b day Type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 454848.25 548926.36 11/03/2021 /MISSING/

目标:

ID1 ID2 Amount a Amount b day Type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out

我的table由几千行组成,但就是这个想法

我尝试使用延迟和 SQL 更新,但没有成功。

如果满足条件,将荒谬的值替换为适当的缺失值,并分配指向前一天的指针。 然后使用初始 table.

查找
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;

data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
    Amount_A = .;
    Amount_B = .;
    _date = day - 1;
    type = "Out";
end;
format _date ddmmyy10.;
run;

data want;
    if 0 then set have;
    if _n_ = 1 then do;
        declare hash h(dataset:'have');
        h.definekey('ID1','ID2','day');
        h.definedata('Amount_A','Amount_B');
        h.definedone();
    end;
    
    set stage1;
    rc = h.find(key:ID1, key:ID2, key: _date);
    drop rc _date;
run;
ID1 ID2 Amount_A Amount_B    day      type
a1  b1  28.45     29.46   10/03/2021  Out
a2  b1  36.84     37.88   10/03/2021  In
a1  b1  28.45     29.46   11/03/2021  Out

为什么不直接按键和日期排序,然后使用 lag 函数往回看一行。也许您已经尝试过了,但只在 if type is missing 块内使用了 lag。如 documentation

中所述,这对您没有帮助

Storing values at the bottom of the queue and returning values from the top of the queue occurs only when the function is executed. An occurrence of the LAG n function that is executed conditionally stores and return values only from the observations for which the condition is satisfied.

相反,计算每一行的滞后,而不仅仅是那些满足条件的滞后。

proc sort data=have;
  by ID1 ID2 day;
run;

data want;
  set have;
  by ID1 ID2;
  lag_amount_a = lag(amount_a);
  lag_amount_b = lag(amount_b);
  lag_day = lag(day);
  lag_type = lag(type);
  if ID1 in ("a1", "a2", "a3") and missing(type) then do;
    // check if row before matches ID1, ID2 and day - 1
    if not first.ID2 and day = lag_day + 1 then do;
      amount_a = lag_amount_a;
      amount_b = lag_amount_b;
      type = lag_type;
    end;
  end;
run;

这看起来像是简单的 LOCF。感谢@kermit 提供的数据。

data have;
   infile datalines delimiter='|';
   input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
   format day ddmmyy10.;
   datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;;;;
   run;
proc sort data=have;
   by id1 id2;
   run;
data LOCF;
   update have(obs=0) have;
   by id1 id2;
   output;
   run;
proc print;
   run;