SAS:了解滞后功能以根据工作进度保留日期
SAS : Understanding lag function to retain dates based on work progess
我有工作进展 sheet。
因此,如果我们有一个 table,工作进度为新的、进度、开始、结束和重新启动,一些规则是:
首先,当工作是 NEW 时,开始日期设置为“1/01/2013”,其他后续 work_progress 设置相同。
其次,如果工作结束并再次添加,开始日期设置为“01/01/2016”(下图:Work_id=3)。以下 work_progress 必须具有相同的值。
最后一个案例,当工作(work_id:1,2) RESTARTs 时,开始日期设置为接收工作的开始。以后的日期必须相同
'01/05/2017'。下面是我的逻辑输出的数据集。
文本缩进
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2013 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2013 | 03/03/2017 |
| 3 | END | April 21, 2017 | 01/01/2013 | 04/04/2017 |
+---------+---------------+-------------------+------------+------------+
实际上我的输出是什么:
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | April 21, 2017 | 01/01/2016 | 02/02/2017 |
+---------+---------------+-------------------+------------+------------+
要求:
- 当 NEW 和
重启。
- 在 work_id=3 和 work_progress= 结束日期。三月和四月
两者的结束日期都应该是 2 月
我需要在这里使用滞后来保留开始和结束日期。除了这个滞后使用部分外,我已经实现了一半的问题逻辑。
部分sas代码:
data m_out_ds;
set m_in_ds;
by work_id work_received_date;
/*--------
Some logic to derive my rules, that gave output, first table above.
----------*/
prevstart = lag(start);
prevend = lag(end);
prev_work_progress = lag(work_progress);
if work_progress = 'END' and prev_work_progress = 'END' then end = prevend;
/*---This gave 02/02/2017 for march received date only,
but we require for april too, obvious the work has ended.----*/
if work_progress = 'PROGRESS' and prev_work_progress ='RESTART'
then start = prevstart;
/*---This however worked---*/
run;
如果您无法理解这一点,请告诉我。
谢谢
这似乎符合您的数据,但我仍然不确定我是否理解规则。首先让我们把你的文字变成数据。
data have ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW | November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
1|END | January 1, 2017 |01/01/2013|02/02/2017
1|RESTART | February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
2|NEW | November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
2|END | January 1, 2017 |01/01/2013|31/12/2020
2|RESTART | February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
3|NEW | November 19, 2016|01/01/2013|31/12/2020
3|END | December 25, 2016|01/01/2013|02/02/2017
3|NEW | January 1, 2017 |01/01/2016|31/12/2020
3|END | February 5, 2017 |01/01/2013|02/02/2017
3|END | March 20, 2017 |01/01/2013|03/03/2017
3|END | April 21, 2017 |01/01/2013|04/04/2017
;
data want ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW |November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
1|END |January 1, 2017 |01/01/2013|02/02/2017
1|RESTART |February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
2|NEW |November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
2|END |January 1, 2017 |01/01/2013|31/12/2020
2|RESTART |February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
3|NEW |November 19, 2016|01/01/2013|31/12/2020
3|END |December 25, 2016|01/01/2013|02/02/2017
3|NEW |January 1, 2017 |01/01/2016|31/12/2020
3|END |February 5, 2017 |01/01/2016|02/02/2017
3|END |March 20, 2017 |01/01/2016|02/02/2017
3|END |April 21, 2017 |01/01/2016|02/02/2017
;
现在我们尝试转换它。
data try ;
set have ;
by work_id;
retain new_start new_end ;
format new_start new_end yymmdd10.;
if first.work_id then call missing(of new_start new_end);
if work_progress in ('NEW','RESTART') then new_start=start ;
start=coalesce(new_start,start);
if work_progress in ('END') then do;
if missing(new_end) then new_end=end ;
end=coalesce(new_end,end);
end;
run;
proc compare data=want compare=try;
id row;
run;
proc print data=try; run;
我有工作进展 sheet。 因此,如果我们有一个 table,工作进度为新的、进度、开始、结束和重新启动,一些规则是:
首先,当工作是 NEW 时,开始日期设置为“1/01/2013”,其他后续 work_progress 设置相同。
其次,如果工作结束并再次添加,开始日期设置为“01/01/2016”(下图:Work_id=3)。以下 work_progress 必须具有相同的值。
最后一个案例,当工作(work_id:1,2) RESTARTs 时,开始日期设置为接收工作的开始。以后的日期必须相同 '01/05/2017'。下面是我的逻辑输出的数据集。
文本缩进
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/01/2013 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2013 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2013 | 03/03/2017 |
| 3 | END | April 21, 2017 | 01/01/2013 | 04/04/2017 |
+---------+---------------+-------------------+------------+------------+
实际上我的输出是什么:
+---------+---------------+-------------------+------------+------------+
| work_id | work_progress | received_date | start | end |
+---------+---------------+-------------------+------------+------------+
| 1 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 1 | END | January 1, 2017 | 01/01/2013 | 02/02/2017 |
| 1 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 1 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | PROGRESS | December 25, 2016 | 01/01/2013 | 31/12/2020 |
| 2 | END | January 1, 2017 | 01/01/2013 | 31/12/2020 |
| 2 | RESTART | February 5, 2017 | 01/05/2017 | 31/12/2020 |
| 2 | PROGRESS | March 20, 2017 | 01/05/2017 | 31/12/2020 |
| 3 | NEW | November 19, 2016 | 01/01/2013 | 31/12/2020 |
| 3 | END | December 25, 2016 | 01/01/2013 | 02/02/2017 |
| 3 | NEW | January 1, 2017 | 01/01/2016 | 31/12/2020 |
| 3 | END | February 5, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | March 20, 2017 | 01/01/2016 | 02/02/2017 |
| 3 | END | April 21, 2017 | 01/01/2016 | 02/02/2017 |
+---------+---------------+-------------------+------------+------------+
要求:
- 当 NEW 和 重启。
- 在 work_id=3 和 work_progress= 结束日期。三月和四月 两者的结束日期都应该是 2 月
我需要在这里使用滞后来保留开始和结束日期。除了这个滞后使用部分外,我已经实现了一半的问题逻辑。 部分sas代码:
data m_out_ds;
set m_in_ds;
by work_id work_received_date;
/*--------
Some logic to derive my rules, that gave output, first table above.
----------*/
prevstart = lag(start);
prevend = lag(end);
prev_work_progress = lag(work_progress);
if work_progress = 'END' and prev_work_progress = 'END' then end = prevend;
/*---This gave 02/02/2017 for march received date only,
but we require for april too, obvious the work has ended.----*/
if work_progress = 'PROGRESS' and prev_work_progress ='RESTART'
then start = prevstart;
/*---This however worked---*/
run;
如果您无法理解这一点,请告诉我。 谢谢
这似乎符合您的数据,但我仍然不确定我是否理解规则。首先让我们把你的文字变成数据。
data have ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW | November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
1|END | January 1, 2017 |01/01/2013|02/02/2017
1|RESTART | February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
2|NEW | November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS| December 25, 2016|01/01/2013|31/12/2020
2|END | January 1, 2017 |01/01/2013|31/12/2020
2|RESTART | February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS| March 20, 2017 |01/01/2013|31/12/2020
3|NEW | November 19, 2016|01/01/2013|31/12/2020
3|END | December 25, 2016|01/01/2013|02/02/2017
3|NEW | January 1, 2017 |01/01/2016|31/12/2020
3|END | February 5, 2017 |01/01/2013|02/02/2017
3|END | March 20, 2017 |01/01/2013|03/03/2017
3|END | April 21, 2017 |01/01/2013|04/04/2017
;
data want ;
infile cards dsd dlm='|' truncover ;
row+1;
length work_id 8 work_progress received_date start end 8 ;
informat received_date anydtdte. start end ddmmyy.;
format received_date start end yymmdd10.;
input work_id -- end ;
CARDS;
1|NEW |November 19, 2016|01/01/2013|31/12/2020
1|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
1|END |January 1, 2017 |01/01/2013|02/02/2017
1|RESTART |February 5, 2017 |01/05/2017|31/12/2020
1|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
2|NEW |November 19, 2016|01/01/2013|31/12/2020
2|PROGRESS |December 25, 2016|01/01/2013|31/12/2020
2|END |January 1, 2017 |01/01/2013|31/12/2020
2|RESTART |February 5, 2017 |01/05/2017|31/12/2020
2|PROGRESS |March 20, 2017 |01/05/2017|31/12/2020
3|NEW |November 19, 2016|01/01/2013|31/12/2020
3|END |December 25, 2016|01/01/2013|02/02/2017
3|NEW |January 1, 2017 |01/01/2016|31/12/2020
3|END |February 5, 2017 |01/01/2016|02/02/2017
3|END |March 20, 2017 |01/01/2016|02/02/2017
3|END |April 21, 2017 |01/01/2016|02/02/2017
;
现在我们尝试转换它。
data try ;
set have ;
by work_id;
retain new_start new_end ;
format new_start new_end yymmdd10.;
if first.work_id then call missing(of new_start new_end);
if work_progress in ('NEW','RESTART') then new_start=start ;
start=coalesce(new_start,start);
if work_progress in ('END') then do;
if missing(new_end) then new_end=end ;
end=coalesce(new_end,end);
end;
run;
proc compare data=want compare=try;
id row;
run;
proc print data=try; run;