查询上一行的数据并更新当前行
Query data from previous row and update current row
我有一个 table 缺少一些数据,所以我必须用前一天的数据替换丢失的数据。
我想做一个 SQL 更新来解决这个问题。
如果满足以下条件 -> 如果ID1在set(a1,a2,a3 ) AND 缺少类型
变量“金额”a/b会有一个荒谬的值
然后我们取前一天行的Amount a/b值,其中ID1和ID2与所在行相同符合标准。
所以这里ID1和ID2分别等于a1和b1,我们在前一天(10/03/2021)查找a1和b1,得到金额28.45/29.46,用来替换虚假金额454848.25 /548926.36.
我们也复制类型值。
ID1
ID2
Amount a
Amount b
day
Type
a1
b1
28.45
29.46
10/03/2021
Out
a2
b1
36.84
37.88
10/03/2021
In
a1
b1
454848.25
548926.36
11/03/2021
/MISSING/
目标:
ID1
ID2
Amount a
Amount b
day
Type
a1
b1
28.45
29.46
10/03/2021
Out
a2
b1
36.84
37.88
10/03/2021
In
a1
b1
28.45
29.46
11/03/2021
Out
我的table由几千行组成,但就是这个想法
我尝试使用延迟和 SQL 更新,但没有成功。
如果满足条件,将荒谬的值替换为适当的缺失值,并分配指向前一天的指针。
然后使用初始 table.
查找
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;
data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
Amount_A = .;
Amount_B = .;
_date = day - 1;
type = "Out";
end;
format _date ddmmyy10.;
run;
data want;
if 0 then set have;
if _n_ = 1 then do;
declare hash h(dataset:'have');
h.definekey('ID1','ID2','day');
h.definedata('Amount_A','Amount_B');
h.definedone();
end;
set stage1;
rc = h.find(key:ID1, key:ID2, key: _date);
drop rc _date;
run;
ID1 ID2 Amount_A Amount_B day type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out
为什么不直接按键和日期排序,然后使用 lag
函数往回看一行。也许您已经尝试过了,但只在 if type is missing
块内使用了 lag
。如 documentation
中所述,这对您没有帮助
Storing values at the bottom of the queue and returning values from
the top of the queue occurs only when the function is executed. An
occurrence of the LAG n function that is executed conditionally stores
and return values only from the observations for which the condition
is satisfied.
相反,计算每一行的滞后,而不仅仅是那些满足条件的滞后。
proc sort data=have;
by ID1 ID2 day;
run;
data want;
set have;
by ID1 ID2;
lag_amount_a = lag(amount_a);
lag_amount_b = lag(amount_b);
lag_day = lag(day);
lag_type = lag(type);
if ID1 in ("a1", "a2", "a3") and missing(type) then do;
// check if row before matches ID1, ID2 and day - 1
if not first.ID2 and day = lag_day + 1 then do;
amount_a = lag_amount_a;
amount_b = lag_amount_b;
type = lag_type;
end;
end;
run;
这看起来像是简单的 LOCF。感谢@kermit 提供的数据。
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;;;;
run;
proc sort data=have;
by id1 id2;
run;
data LOCF;
update have(obs=0) have;
by id1 id2;
output;
run;
proc print;
run;
我有一个 table 缺少一些数据,所以我必须用前一天的数据替换丢失的数据。
我想做一个 SQL 更新来解决这个问题。
如果满足以下条件 -> 如果ID1在set(a1,a2,a3 ) AND 缺少类型
变量“金额”a/b会有一个荒谬的值
然后我们取前一天行的Amount a/b值,其中ID1和ID2与所在行相同符合标准。
所以这里ID1和ID2分别等于a1和b1,我们在前一天(10/03/2021)查找a1和b1,得到金额28.45/29.46,用来替换虚假金额454848.25 /548926.36.
我们也复制类型值。
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 454848.25 | 548926.36 | 11/03/2021 | /MISSING/ |
目标:
ID1 | ID2 | Amount a | Amount b | day | Type |
---|---|---|---|---|---|
a1 | b1 | 28.45 | 29.46 | 10/03/2021 | Out |
a2 | b1 | 36.84 | 37.88 | 10/03/2021 | In |
a1 | b1 | 28.45 | 29.46 | 11/03/2021 | Out |
我的table由几千行组成,但就是这个想法
我尝试使用延迟和 SQL 更新,但没有成功。
如果满足条件,将荒谬的值替换为适当的缺失值,并分配指向前一天的指针。 然后使用初始 table.
查找data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;
data stage1;
set have;
if ID1 in ('a1','a2','a3') and type = "" then do;
Amount_A = .;
Amount_B = .;
_date = day - 1;
type = "Out";
end;
format _date ddmmyy10.;
run;
data want;
if 0 then set have;
if _n_ = 1 then do;
declare hash h(dataset:'have');
h.definekey('ID1','ID2','day');
h.definedata('Amount_A','Amount_B');
h.definedone();
end;
set stage1;
rc = h.find(key:ID1, key:ID2, key: _date);
drop rc _date;
run;
ID1 ID2 Amount_A Amount_B day type
a1 b1 28.45 29.46 10/03/2021 Out
a2 b1 36.84 37.88 10/03/2021 In
a1 b1 28.45 29.46 11/03/2021 Out
为什么不直接按键和日期排序,然后使用 lag
函数往回看一行。也许您已经尝试过了,但只在 if type is missing
块内使用了 lag
。如 documentation
Storing values at the bottom of the queue and returning values from the top of the queue occurs only when the function is executed. An occurrence of the LAG n function that is executed conditionally stores and return values only from the observations for which the condition is satisfied.
相反,计算每一行的滞后,而不仅仅是那些满足条件的滞后。
proc sort data=have;
by ID1 ID2 day;
run;
data want;
set have;
by ID1 ID2;
lag_amount_a = lag(amount_a);
lag_amount_b = lag(amount_b);
lag_day = lag(day);
lag_type = lag(type);
if ID1 in ("a1", "a2", "a3") and missing(type) then do;
// check if row before matches ID1, ID2 and day - 1
if not first.ID2 and day = lag_day + 1 then do;
amount_a = lag_amount_a;
amount_b = lag_amount_b;
type = lag_type;
end;
end;
run;
这看起来像是简单的 LOCF。感谢@kermit 提供的数据。
data have;
infile datalines delimiter='|';
input ID1 $ ID2 $ Amount_A Amount_B day :ddmmyy10. type $;
format day ddmmyy10.;
datalines;
a1|b1|28.45|29.46|10/03/2021|Out
a2|b1|36.84|37.88|10/03/2021|In
a1|b1|454848.25|548926.36|11/03/2021|
;;;;
run;
proc sort data=have;
by id1 id2;
run;
data LOCF;
update have(obs=0) have;
by id1 id2;
output;
run;
proc print;
run;