如何根据特征 ID 的最大值创建标记列?

How to Create a Flagging Column Based on The Maximum Value Of a Feature ID?

我有这个数据集:

 data test;
 input Feature_ID Client_ID;
 cards;
 52004 541111
 56222 541111
 56300 541111
 73222 980002 
 73600 980002
 78006 980002
 85000 980002
 95001 1000001
 98020 1000001
 ;
 run;

我想创建一个标志列,每个客户端的最大值为 1 Feature_ID。

结果应该如下:

 data test;
 input Feature_ID Client_ID Flag;
 cards;
 52004 541111 0
 56222 541111 0
 56300 541111 1
 73222 980002 0
 73600 980002 0
 78006 980002 0
 85000 980002 1
 95001 1000001 0
 98020 1000001 1
 ;
 run;

我该怎么做?

我所做的(因为原始数据未排序),我首先使用 Proc SQL 对数据进行排序,这样:

 proc sql;
     create table tab_Trial as select
           Feature_ID
          ,Client_ID
       from Test
       order by Feature_ID, Client_ID;
  quit;

然后尝试使用此代码创建标志列

 data Flagging; 
    set Tab_Trial;
    by Client_ID; 
    if Last.Feature_ID = 1 then Flag = 1;
    else Flag = 0;
 run; 

但是我得到一个 Column Flag 填充了 0。 任何帮助将不胜感激。

在proc sql中,可以使用GROUP BY获取最大feature id然后case逻辑赋值flag:

proc sql;
    create table tab_Trial as
        select Feature_ID, Client_ID,
               (case when Feature_ID = max_Feature_ID then 1 else 0 end) as flag
       from Test t join
            (select Client_ID, max(Feature_ID) as max_Feature_ID
             from Test t
             group by Client_ID
            ) tc
            on tc.Client_ID = t.Client_ID
       order by Feature_ID, Client_ID;
  quit;

尝试使用last.variable,但首先,对数据集进行排序:

data test;
 input Feature_ID Client_ID;
 cards;
 52004 541111
 56300 541111
 56222 541111
 73222 980002 
 73600 980002
 85000 980002
 78006 980002
 98020 1000001
 95001 1000001
 ;
 run;


 proc sort data=test out=test_sorted;
 by Client_ID Feature_ID;
 quit;


 data test1;
   set test_sorted;
   by Client_ID Feature_ID;
   if last.Client_Id then flag=1;
   else flag=0;
 run;

输入:

+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
|      52004 |    541111 |
|      56300 |    541111 |
|      56222 |    541111 |
|      73222 |    980002 |
|      73600 |    980002 |
|      85000 |    980002 |
|      78006 |    980002 |
|      98020 |   1000001 |
|      95001 |   1000001 |
+------------+-----------+

排序数据集:

+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
|      52004 |    541111 |
|      56222 |    541111 |
|      56300 |    541111 |
|      73222 |    980002 |
|      73600 |    980002 |
|      78006 |    980002 |
|      85000 |    980002 |
|      95001 |   1000001 |
|      98020 |   1000001 |
+------------+-----------+

输出:

+------------+-----------+------+
| Feature_ID | Client_ID | flag |
+------------+-----------+------+
|      52004 |    541111 |    0 |
|      56222 |    541111 |    0 |
|      56300 |    541111 |    1 |
|      73222 |    980002 |    0 |
|      73600 |    980002 |    0 |
|      78006 |    980002 |    0 |
|      85000 |    980002 |    1 |
|      95001 |   1000001 |    0 |
|      98020 |   1000001 |    1 |
+------------+-----------+------+

如果您的数据集已按 client_id 排序,则无需进一步排序 - 您可以使用双 DOW 循环:

data have;
input Feature_ID Client_ID;
cards;
52004 541111
56222 541111
56300 541111
73222 980002 
73600 980002
78006 980002
85000 980002
95001 1000001
98020 1000001
;
run;

data want;
do _n_ = 1 by 1 until(last.client_id);
  set have;
  by client_id;
  max_feature_id = max(feature_id,max_feature_id);
end;
do _n_ = 1 to _n_;
  set have;
  flag = feature_id = max_feature_id;
  output;
end;
drop max_feature_id;
run;