SAS 编程 - 删除重复项
SAS Programming - removing duplicates
根据城市之间的距离,我有以下数据。
Source Destination Distance
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
我想删除源和目标相同的重复项。例如 USA 到 UK 将与 UK 到 USA 因此需要删除重复值。
以下是所需的输出。
Source Destination Distance
USA UK 1000
USA Spain 200
Germany Spain 500
您必须为所需的所有路线创建维度/查找 table,然后查找值以标准化所需的输出。
我创建了一个名为 Routes 的查找 table,以及包含要查找的所有对值的变量。
完整代码:
data have;
input Source $ Destination $ Distance ;
datalines;
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
;
run;
data routes;
infile datalines dsd dlm=',';
length pairs .;
input Source $ Destination $ Distance Pairs $ ;
datalines;
USA,UK,1000,USA-UK/UK-USA
USA,Spain,200,USA-Spain/Spain-USA
Germany, Spain,500,Germany-Spain/Spain-Germany
;
run;
proc sql;
create table want as
Select distinct
t2.Source, t2.Destination, t2.Distance
from have t1 inner join routes t2 on
t2.Pairs contains catx('-',t1.Source,t1.Destination) or
t2.Pairs contains catx('-',t1.Destination,t1.Source)
;
quit;
输出:
Source=Germany Destination=Spain Distance=500
Source=USA Destination=Spain Distance=200
Source=USA Destination=UK Distance=1000
首先通过调用 sortc 生成一个虚拟变量来保存排序的源和目标,然后按虚拟变量排序。
data have;
input Source $ Destination $ Distance;
cards;
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
;
data temp;
set have;
length dummy .;
_var1=source; _var2=destination;
call sortc (of _:);
dummy=catx(' ',of _:);
drop _:;
run;
proc sort data=temp out=want(drop=dummy) nodupkey;
by dummy;
run;
根据城市之间的距离,我有以下数据。
Source Destination Distance
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
我想删除源和目标相同的重复项。例如 USA 到 UK 将与 UK 到 USA 因此需要删除重复值。
以下是所需的输出。
Source Destination Distance
USA UK 1000
USA Spain 200
Germany Spain 500
您必须为所需的所有路线创建维度/查找 table,然后查找值以标准化所需的输出。
我创建了一个名为 Routes 的查找 table,以及包含要查找的所有对值的变量。
完整代码:
data have;
input Source $ Destination $ Distance ;
datalines;
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
;
run;
data routes;
infile datalines dsd dlm=',';
length pairs .;
input Source $ Destination $ Distance Pairs $ ;
datalines;
USA,UK,1000,USA-UK/UK-USA
USA,Spain,200,USA-Spain/Spain-USA
Germany, Spain,500,Germany-Spain/Spain-Germany
;
run;
proc sql;
create table want as
Select distinct
t2.Source, t2.Destination, t2.Distance
from have t1 inner join routes t2 on
t2.Pairs contains catx('-',t1.Source,t1.Destination) or
t2.Pairs contains catx('-',t1.Destination,t1.Source)
;
quit;
输出:
Source=Germany Destination=Spain Distance=500
Source=USA Destination=Spain Distance=200
Source=USA Destination=UK Distance=1000
首先通过调用 sortc 生成一个虚拟变量来保存排序的源和目标,然后按虚拟变量排序。
data have;
input Source $ Destination $ Distance;
cards;
USA UK 1000
USA Spain 200
UK USA 1000
Germany Spain 500
Spain USA 200
;
data temp;
set have;
length dummy .;
_var1=source; _var2=destination;
call sortc (of _:);
dummy=catx(' ',of _:);
drop _:;
run;
proc sort data=temp out=want(drop=dummy) nodupkey;
by dummy;
run;