SAS 编程 - 删除重复项

SAS Programming - removing duplicates

根据城市之间的距离,我有以下数据。

Source Destination Distance 
USA       UK        1000 
USA      Spain      200 
UK        USA       1000 
Germany  Spain      500 
Spain     USA       200 

我想删除源和目标相同的重复项。例如 USAUK 将与 UKUSA 因此需要删除重复值。

以下是所需的输出。

Source Destination Distance 
USA        UK        1000 
USA      Spain       200 
Germany  Spain       500 

您必须为所需的所有路线创建维度/查找 table,然后查找值以标准化所需的输出。

我创建了一个名为 Routes 的查找 table,以及包含要查找的所有对值的变量。

完整代码:

data have;
input Source $ Destination $ Distance ;
datalines;
USA       UK        1000
USA      Spain      200
UK        USA       1000
Germany  Spain      500
Spain     USA       200
;
run;
data routes;
infile datalines dsd dlm=',';
length pairs .;
input Source $ Destination $ Distance Pairs $ ;
datalines;
USA,UK,1000,USA-UK/UK-USA
USA,Spain,200,USA-Spain/Spain-USA
Germany, Spain,500,Germany-Spain/Spain-Germany
;
run;

proc sql;
create table want as
     Select distinct 
     t2.Source, t2.Destination,  t2.Distance
     from have  t1 inner join routes t2 on
      t2.Pairs contains catx('-',t1.Source,t1.Destination) or 
      t2.Pairs contains catx('-',t1.Destination,t1.Source)
;
quit;

输出:

 Source=Germany Destination=Spain Distance=500 
 Source=USA Destination=Spain Distance=200 
 Source=USA Destination=UK Distance=1000

首先通过调用 sortc 生成一个虚拟变量来保存排序的源和目标,然后按虚拟变量排序。

data have;
input Source $ Destination $ Distance;
cards;
USA       UK        1000 
USA      Spain      200 
UK        USA       1000 
Germany  Spain      500 
Spain     USA       200 
;

data temp;
    set have;
    length dummy .;
    _var1=source; _var2=destination;
    call sortc (of _:);
    dummy=catx(' ',of _:);
    drop _:;
run;

proc sort data=temp out=want(drop=dummy) nodupkey;
by dummy;
run;