基于前一行的插值分组
Group By Interpolation Based on the Previous Row
目标是在按 id 分组的两行之间的 date 变量之间存在间隙时添加新行。
如果出现间隙,则复制第一行。但是,只有日期功能不应该作为第一行,而应该增加一天。
此外,所有内容都需要按 id 分组。我需要在不扩展功能的情况下实现。
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
想要的结果:
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
您可以从第 2 行开始与第二个自我执行 1:1 自我合并,以提供 lead 值。 1:1 合并 不 使用 BY
语句。
示例:
data have;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
* 1:1 merge without by statement;
merge
have /* start at row 1 */
have ( firstobs=2 /* start at row 2 for lead values */
keep=id date /* more data set options that prepare the lead */
rename = ( id=nextid
date=nextdate
))
;
output;
flag = '*'; /* marker for filled in dates */
if id = nextid then
do date=date+1 to nextdate-1;
output;
end;
drop next:;
run;
结果标记填写日期
要“向前看”,您可以从第二次观察开始重新读取相同的数据集。当您读取到输入末尾时,SAS 将停止,因此添加一个额外的空观察。
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
set sample;
by id;
set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
output;
if not last.id then do date=date+1 to next_date-1; output; end;
run;
结果:
numeric_ character_
Obs id date feature feature next_date
1 1 2020-01-01 5 A 2020-01-02
2 1 2020-01-02 3 Z 2020-01-04
3 1 2020-01-03 3 Z 2020-01-04
4 1 2020-01-04 2 D 2020-01-05
5 1 2020-01-05 7 B 2020-01-01
6 2 2020-01-01 4 V 2020-01-03
7 2 2020-01-02 4 V 2020-01-03
8 2 2020-01-03 1 B 2020-01-05
9 2 2020-01-04 1 B 2020-01-05
10 2 2020-01-05 9 F .
目标是在按 id 分组的两行之间的 date 变量之间存在间隙时添加新行。 如果出现间隙,则复制第一行。但是,只有日期功能不应该作为第一行,而应该增加一天。
此外,所有内容都需要按 id 分组。我需要在不扩展功能的情况下实现。
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
想要的结果:
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-03 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-02 4 V
2 2020-01-03 1 B
2 2020-01-04 1 B
2 2020-01-05 9 F
;
data sample;
set sample;
format date yymmdd10.;
run;
您可以从第 2 行开始与第二个自我执行 1:1 自我合并,以提供 lead 值。 1:1 合并 不 使用 BY
语句。
示例:
data have;
input id date numeric_feature character_feature $;
informat date yymmdd10.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
* 1:1 merge without by statement;
merge
have /* start at row 1 */
have ( firstobs=2 /* start at row 2 for lead values */
keep=id date /* more data set options that prepare the lead */
rename = ( id=nextid
date=nextdate
))
;
output;
flag = '*'; /* marker for filled in dates */
if id = nextid then
do date=date+1 to nextdate-1;
output;
end;
drop next:;
run;
结果标记填写日期
要“向前看”,您可以从第二次观察开始重新读取相同的数据集。当您读取到输入末尾时,SAS 将停止,因此添加一个额外的空观察。
data sample;
input id date numeric_feature character_feature $;
informat date yymmdd.;
format date yymmdd10.;
datalines;
1 2020-01-01 5 A
1 2020-01-02 3 Z
1 2020-01-04 2 D
1 2020-01-05 7 B
2 2020-01-01 4 V
2 2020-01-03 1 B
2 2020-01-05 9 F
;
data want;
set sample;
by id;
set sample(firstobs=2 keep=date rename=(date=next_date)) sample(obs=1 drop=_all_);
output;
if not last.id then do date=date+1 to next_date-1; output; end;
run;
结果:
numeric_ character_
Obs id date feature feature next_date
1 1 2020-01-01 5 A 2020-01-02
2 1 2020-01-02 3 Z 2020-01-04
3 1 2020-01-03 3 Z 2020-01-04
4 1 2020-01-04 2 D 2020-01-05
5 1 2020-01-05 7 B 2020-01-01
6 2 2020-01-01 4 V 2020-01-03
7 2 2020-01-02 4 V 2020-01-03
8 2 2020-01-03 1 B 2020-01-05
9 2 2020-01-04 1 B 2020-01-05
10 2 2020-01-05 9 F .