计算分组行之间的最大差异
Calculate maximum difference between grouped rows
我有以下数据,其中家庭成员按年龄(从大到小)排序:
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
我想为每个家庭计算连续老年人之间的最大年龄差异。因此,此示例将为家庭 1、2 和 3 连续给出 5 (=25-20)、16 (=32-16) 和 32 (=42-10) 的值。
我可以使用大量合并来做到这一点(即提取人 1,与人 2 的提取物合并,等等),但是由于一个家庭中最多可以有 20 多人,所以我正在寻找更多更直接的方法。
proc sort data=houses; by houseid personid age;run;
data _t1;
set houses;
diff = dif1(age) * (-1);
if personid = 1 then diff = .;
run;
proc sql;
create table want as
select houseid, max(diff) as Max_Diff
from _t1
group by houseid;
proc sort data = house;
by houseid descending age;
run;
data house;
set house;
by houseid;
lag_age = lag1(age);
if first.houseid then age_diff = 0;
age_diff = lag_age - age;
run;
proc sql;
select houseid,max(age_diff) as max_age_diff
from house
group by houseid;
quit;
工作:
首先使用houseid和降序年龄对数据集进行排序。
第二个数据步骤将计算当前年龄值(在 PDV 中)与之前在 PDV 中的年龄值之间的差异。然后,使用 sql 过程,我们可以获得每个 houseid 的最大年龄差异。
这是一个两次通过的解决方案。与上述两种解决方案的第一步相同,按年龄排序。在第二步中,每行跟踪 max_diff,在 HouseID 的最后一条记录处输出结果。这导致仅两次通过数据。
proc sort data=houses; by houseid age;run;
data want;
set houses;
by houseID;
retain max_diff 0;
diff = dif1(age)*-1;
if first.HouseID then do;
diff = .; max_diff=.;
end;
if diff>max_diff then max_diff=diff;
if last.houseID then output;
keep houseID max_diff;
run;
再加入一个。这是 Reeza 回复的浓缩版。
/* No need to sort by PersonID as age is the only concern */
proc sort data = houses;
by HouseID Age;
run;
data want;
set houses;
by HouseID;
/* Keep the diff when a new row is loaded */
retain diff;
/* Only replace the diff if it is larger than previous */
diff = max(diff, abs(dif(Age)));
/* Reset diff for each new house */
if first.HouseID then diff = 0;
/* Only output the final diff for each house */
if last.HouseID;
keep HouseID diff;
run;
这里是一个使用 FIRST. and LAST. 并一次(排序后)遍历数据的示例。
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
Proc sort data=HOUSES;
by houseid descending age ;
run;
Data WANT(keep=houseid max_diff);
format houseid max_diff;
retain max_diff age1 age2;
Set HOUSES;
by houseid descending age ;
if first.houseid and last.houseid then do;
max_diff=0;
output;
end;
else if first.houseid then do;
call missing(max_diff,age1,age2);
age1=age;
end;
else if not(first.houseid or last.houseid) then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
age1=age;
end;
else if last.houseid then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
output;
end;
Run;
我有以下数据,其中家庭成员按年龄(从大到小)排序:
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
我想为每个家庭计算连续老年人之间的最大年龄差异。因此,此示例将为家庭 1、2 和 3 连续给出 5 (=25-20)、16 (=32-16) 和 32 (=42-10) 的值。
我可以使用大量合并来做到这一点(即提取人 1,与人 2 的提取物合并,等等),但是由于一个家庭中最多可以有 20 多人,所以我正在寻找更多更直接的方法。
proc sort data=houses; by houseid personid age;run;
data _t1;
set houses;
diff = dif1(age) * (-1);
if personid = 1 then diff = .;
run;
proc sql;
create table want as
select houseid, max(diff) as Max_Diff
from _t1
group by houseid;
proc sort data = house;
by houseid descending age;
run;
data house;
set house;
by houseid;
lag_age = lag1(age);
if first.houseid then age_diff = 0;
age_diff = lag_age - age;
run;
proc sql;
select houseid,max(age_diff) as max_age_diff
from house
group by houseid;
quit;
工作:
首先使用houseid和降序年龄对数据集进行排序。 第二个数据步骤将计算当前年龄值(在 PDV 中)与之前在 PDV 中的年龄值之间的差异。然后,使用 sql 过程,我们可以获得每个 houseid 的最大年龄差异。
这是一个两次通过的解决方案。与上述两种解决方案的第一步相同,按年龄排序。在第二步中,每行跟踪 max_diff,在 HouseID 的最后一条记录处输出结果。这导致仅两次通过数据。
proc sort data=houses; by houseid age;run;
data want;
set houses;
by houseID;
retain max_diff 0;
diff = dif1(age)*-1;
if first.HouseID then do;
diff = .; max_diff=.;
end;
if diff>max_diff then max_diff=diff;
if last.houseID then output;
keep houseID max_diff;
run;
再加入一个。这是 Reeza 回复的浓缩版。
/* No need to sort by PersonID as age is the only concern */
proc sort data = houses;
by HouseID Age;
run;
data want;
set houses;
by HouseID;
/* Keep the diff when a new row is loaded */
retain diff;
/* Only replace the diff if it is larger than previous */
diff = max(diff, abs(dif(Age)));
/* Reset diff for each new house */
if first.HouseID then diff = 0;
/* Only output the final diff for each house */
if last.HouseID;
keep HouseID diff;
run;
这里是一个使用 FIRST. and LAST. 并一次(排序后)遍历数据的示例。
data houses;
input HouseID PersonID Age;
datalines;
1 1 25
1 2 20
2 1 32
2 2 16
2 3 14
2 4 12
3 1 44
3 2 42
3 3 10
3 4 5
;
run;
Proc sort data=HOUSES;
by houseid descending age ;
run;
Data WANT(keep=houseid max_diff);
format houseid max_diff;
retain max_diff age1 age2;
Set HOUSES;
by houseid descending age ;
if first.houseid and last.houseid then do;
max_diff=0;
output;
end;
else if first.houseid then do;
call missing(max_diff,age1,age2);
age1=age;
end;
else if not(first.houseid or last.houseid) then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
age1=age;
end;
else if last.houseid then do;
age2=age;
temp=age1-age2;
if temp>max_diff then max_diff=temp;
output;
end;
Run;