如何根据特征 ID 的最大值创建标记列?
How to Create a Flagging Column Based on The Maximum Value Of a Feature ID?
我有这个数据集:
data test;
input Feature_ID Client_ID;
cards;
52004 541111
56222 541111
56300 541111
73222 980002
73600 980002
78006 980002
85000 980002
95001 1000001
98020 1000001
;
run;
我想创建一个标志列,每个客户端的最大值为 1 Feature_ID。
结果应该如下:
data test;
input Feature_ID Client_ID Flag;
cards;
52004 541111 0
56222 541111 0
56300 541111 1
73222 980002 0
73600 980002 0
78006 980002 0
85000 980002 1
95001 1000001 0
98020 1000001 1
;
run;
我该怎么做?
我所做的(因为原始数据未排序),我首先使用 Proc SQL 对数据进行排序,这样:
proc sql;
create table tab_Trial as select
Feature_ID
,Client_ID
from Test
order by Feature_ID, Client_ID;
quit;
然后尝试使用此代码创建标志列
data Flagging;
set Tab_Trial;
by Client_ID;
if Last.Feature_ID = 1 then Flag = 1;
else Flag = 0;
run;
但是我得到一个 Column Flag 填充了 0。
任何帮助将不胜感激。
在proc sql中,可以使用GROUP BY
获取最大feature id然后case
逻辑赋值flag:
proc sql;
create table tab_Trial as
select Feature_ID, Client_ID,
(case when Feature_ID = max_Feature_ID then 1 else 0 end) as flag
from Test t join
(select Client_ID, max(Feature_ID) as max_Feature_ID
from Test t
group by Client_ID
) tc
on tc.Client_ID = t.Client_ID
order by Feature_ID, Client_ID;
quit;
尝试使用last.variable,但首先,对数据集进行排序:
data test;
input Feature_ID Client_ID;
cards;
52004 541111
56300 541111
56222 541111
73222 980002
73600 980002
85000 980002
78006 980002
98020 1000001
95001 1000001
;
run;
proc sort data=test out=test_sorted;
by Client_ID Feature_ID;
quit;
data test1;
set test_sorted;
by Client_ID Feature_ID;
if last.Client_Id then flag=1;
else flag=0;
run;
输入:
+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
| 52004 | 541111 |
| 56300 | 541111 |
| 56222 | 541111 |
| 73222 | 980002 |
| 73600 | 980002 |
| 85000 | 980002 |
| 78006 | 980002 |
| 98020 | 1000001 |
| 95001 | 1000001 |
+------------+-----------+
排序数据集:
+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
| 52004 | 541111 |
| 56222 | 541111 |
| 56300 | 541111 |
| 73222 | 980002 |
| 73600 | 980002 |
| 78006 | 980002 |
| 85000 | 980002 |
| 95001 | 1000001 |
| 98020 | 1000001 |
+------------+-----------+
输出:
+------------+-----------+------+
| Feature_ID | Client_ID | flag |
+------------+-----------+------+
| 52004 | 541111 | 0 |
| 56222 | 541111 | 0 |
| 56300 | 541111 | 1 |
| 73222 | 980002 | 0 |
| 73600 | 980002 | 0 |
| 78006 | 980002 | 0 |
| 85000 | 980002 | 1 |
| 95001 | 1000001 | 0 |
| 98020 | 1000001 | 1 |
+------------+-----------+------+
如果您的数据集已按 client_id
排序,则无需进一步排序 - 您可以使用双 DOW 循环:
data have;
input Feature_ID Client_ID;
cards;
52004 541111
56222 541111
56300 541111
73222 980002
73600 980002
78006 980002
85000 980002
95001 1000001
98020 1000001
;
run;
data want;
do _n_ = 1 by 1 until(last.client_id);
set have;
by client_id;
max_feature_id = max(feature_id,max_feature_id);
end;
do _n_ = 1 to _n_;
set have;
flag = feature_id = max_feature_id;
output;
end;
drop max_feature_id;
run;
我有这个数据集:
data test;
input Feature_ID Client_ID;
cards;
52004 541111
56222 541111
56300 541111
73222 980002
73600 980002
78006 980002
85000 980002
95001 1000001
98020 1000001
;
run;
我想创建一个标志列,每个客户端的最大值为 1 Feature_ID。
结果应该如下:
data test;
input Feature_ID Client_ID Flag;
cards;
52004 541111 0
56222 541111 0
56300 541111 1
73222 980002 0
73600 980002 0
78006 980002 0
85000 980002 1
95001 1000001 0
98020 1000001 1
;
run;
我该怎么做?
我所做的(因为原始数据未排序),我首先使用 Proc SQL 对数据进行排序,这样:
proc sql;
create table tab_Trial as select
Feature_ID
,Client_ID
from Test
order by Feature_ID, Client_ID;
quit;
然后尝试使用此代码创建标志列
data Flagging;
set Tab_Trial;
by Client_ID;
if Last.Feature_ID = 1 then Flag = 1;
else Flag = 0;
run;
但是我得到一个 Column Flag 填充了 0。 任何帮助将不胜感激。
在proc sql中,可以使用GROUP BY
获取最大feature id然后case
逻辑赋值flag:
proc sql;
create table tab_Trial as
select Feature_ID, Client_ID,
(case when Feature_ID = max_Feature_ID then 1 else 0 end) as flag
from Test t join
(select Client_ID, max(Feature_ID) as max_Feature_ID
from Test t
group by Client_ID
) tc
on tc.Client_ID = t.Client_ID
order by Feature_ID, Client_ID;
quit;
尝试使用last.variable,但首先,对数据集进行排序:
data test;
input Feature_ID Client_ID;
cards;
52004 541111
56300 541111
56222 541111
73222 980002
73600 980002
85000 980002
78006 980002
98020 1000001
95001 1000001
;
run;
proc sort data=test out=test_sorted;
by Client_ID Feature_ID;
quit;
data test1;
set test_sorted;
by Client_ID Feature_ID;
if last.Client_Id then flag=1;
else flag=0;
run;
输入:
+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
| 52004 | 541111 |
| 56300 | 541111 |
| 56222 | 541111 |
| 73222 | 980002 |
| 73600 | 980002 |
| 85000 | 980002 |
| 78006 | 980002 |
| 98020 | 1000001 |
| 95001 | 1000001 |
+------------+-----------+
排序数据集:
+------------+-----------+
| Feature_ID | Client_ID |
+------------+-----------+
| 52004 | 541111 |
| 56222 | 541111 |
| 56300 | 541111 |
| 73222 | 980002 |
| 73600 | 980002 |
| 78006 | 980002 |
| 85000 | 980002 |
| 95001 | 1000001 |
| 98020 | 1000001 |
+------------+-----------+
输出:
+------------+-----------+------+
| Feature_ID | Client_ID | flag |
+------------+-----------+------+
| 52004 | 541111 | 0 |
| 56222 | 541111 | 0 |
| 56300 | 541111 | 1 |
| 73222 | 980002 | 0 |
| 73600 | 980002 | 0 |
| 78006 | 980002 | 0 |
| 85000 | 980002 | 1 |
| 95001 | 1000001 | 0 |
| 98020 | 1000001 | 1 |
+------------+-----------+------+
如果您的数据集已按 client_id
排序,则无需进一步排序 - 您可以使用双 DOW 循环:
data have;
input Feature_ID Client_ID;
cards;
52004 541111
56222 541111
56300 541111
73222 980002
73600 980002
78006 980002
85000 980002
95001 1000001
98020 1000001
;
run;
data want;
do _n_ = 1 by 1 until(last.client_id);
set have;
by client_id;
max_feature_id = max(feature_id,max_feature_id);
end;
do _n_ = 1 to _n_;
set have;
flag = feature_id = max_feature_id;
output;
end;
drop max_feature_id;
run;