return 列中出现频率最高的值
return the most frequent value within columns
我在 SAS 中有一个 table,其中有一个 customer_id 和 5 个包含他每月状态的列。客户有 6 种不同的状态。
例如
customer_id month1 month2 month3 month4 month5
12345678 Waiting Inactive Active Active Canceled
我想要 return 列 month1 - month5 中出现频率最高的值。在这种情况下,它是值 Active。
所以结果将是
customer_id frequent
12345678 Active
SAS有函数吗?我知道如何使用 sql 来完成它,但是在很多情况下它会非常复杂。我是 SAS 的新手,所以我想会有更好的解决方案。
有点糟糕的解决方案,但它确实适用于 2 个不同的值,在本例中为 5 个月。如果活跃人数 >= 3,那是最常见的值:
select customer_id, case when (case when month1 = 'Active' then 1 else 0 end +
case when month2 = 'Active' then 1 else 0 end +
case when month3 = 'Active' then 1 else 0 end +
case when month4 = 'Active' then 1 else 0 end +
case when month5 = 'Active' then 1 else 0 end) >= 3
then 'Active' else 'Waiting' end
from tablename
另一种方式,UNION ALL
:
select customer_id, month, count(*) as cnt
(
select customer_id, month1 as month from tablename
UNION ALL
select customer_id, month2 from tablename
UNION ALL
select customer_id, month3 from tablename
UNION ALL
select customer_id, month4 from tablename
UNION ALL
select customer_id, month5 from tablename
)
group by customer_id, month
order by cnt
fetch first 1 row only
其中 FETCH FIRST
是 ANSI SQL,对于某些 dbms 产品可能是 TOP
或 LIMIT
。
如果您使用数组将客户历史记录的每个月的数据集拆分为一个观察值,您可以使用 proc sql 中的汇总函数轻松获得最频繁出现的事件并使用最近一个月 (假设那是第 5 个月)打破关系。
data want1;
set have;
array m(*) month1 -- month5;
do i = 1 to dim(m);
cid = customer_id;
frequent = m(i);
position = i;
output;
end;
keep cid frequent position;
run;
proc sql;
create table want2 as select
cid as customer_id,
frequent,
max(position) as max_pos,
count(frequent) as count
from want1
group by cid, frequent;
quit;
proc sort data = want2; by customer_id descending count descending max_pos; run;
data want3;
set want2;
by customer_id descending count descending max_pos;
if first.customer_id;
drop max_pos count;
run;
我在 SAS 中有一个 table,其中有一个 customer_id 和 5 个包含他每月状态的列。客户有 6 种不同的状态。 例如
customer_id month1 month2 month3 month4 month5
12345678 Waiting Inactive Active Active Canceled
我想要 return 列 month1 - month5 中出现频率最高的值。在这种情况下,它是值 Active。 所以结果将是
customer_id frequent
12345678 Active
SAS有函数吗?我知道如何使用 sql 来完成它,但是在很多情况下它会非常复杂。我是 SAS 的新手,所以我想会有更好的解决方案。
有点糟糕的解决方案,但它确实适用于 2 个不同的值,在本例中为 5 个月。如果活跃人数 >= 3,那是最常见的值:
select customer_id, case when (case when month1 = 'Active' then 1 else 0 end +
case when month2 = 'Active' then 1 else 0 end +
case when month3 = 'Active' then 1 else 0 end +
case when month4 = 'Active' then 1 else 0 end +
case when month5 = 'Active' then 1 else 0 end) >= 3
then 'Active' else 'Waiting' end
from tablename
另一种方式,UNION ALL
:
select customer_id, month, count(*) as cnt
(
select customer_id, month1 as month from tablename
UNION ALL
select customer_id, month2 from tablename
UNION ALL
select customer_id, month3 from tablename
UNION ALL
select customer_id, month4 from tablename
UNION ALL
select customer_id, month5 from tablename
)
group by customer_id, month
order by cnt
fetch first 1 row only
其中 FETCH FIRST
是 ANSI SQL,对于某些 dbms 产品可能是 TOP
或 LIMIT
。
如果您使用数组将客户历史记录的每个月的数据集拆分为一个观察值,您可以使用 proc sql 中的汇总函数轻松获得最频繁出现的事件并使用最近一个月 (假设那是第 5 个月)打破关系。
data want1;
set have;
array m(*) month1 -- month5;
do i = 1 to dim(m);
cid = customer_id;
frequent = m(i);
position = i;
output;
end;
keep cid frequent position;
run;
proc sql;
create table want2 as select
cid as customer_id,
frequent,
max(position) as max_pos,
count(frequent) as count
from want1
group by cid, frequent;
quit;
proc sort data = want2; by customer_id descending count descending max_pos; run;
data want3;
set want2;
by customer_id descending count descending max_pos;
if first.customer_id;
drop max_pos count;
run;