return 列中出现频率最高的值

Question

我在 SAS 中有一个 table，其中有一个 customer_id 和 5 个包含他每月状态的列。客户有 6 种不同的状态。例如

customer_id   month1    month2    month3    month4    month5 
12345678      Waiting   Inactive  Active    Active    Canceled

我想要 return 列 month1 - month5 中出现频率最高的值。在这种情况下，它是值 Active。所以结果将是

customer_id   frequent
12345678      Active

SAS有函数吗？我知道如何使用 sql 来完成它，但是在很多情况下它会非常复杂。我是 SAS 的新手，所以我想会有更好的解决方案。

Answer 1

有点糟糕的解决方案，但它确实适用于 2 个不同的值，在本例中为 5 个月。如果活跃人数 >= 3，那是最常见的值：

select customer_id, case when (case when month1 = 'Active' then 1 else 0 end +
                               case when month2 = 'Active' then 1 else 0 end +
                               case when month3 = 'Active' then 1 else 0 end +
                               case when month4 = 'Active' then 1 else 0 end +
                               case when month5 = 'Active' then 1 else 0 end) >= 3
                             then 'Active' else 'Waiting' end
from tablename

另一种方式，UNION ALL：

select customer_id, month, count(*) as cnt
(
    select customer_id, month1 as month from tablename
    UNION ALL
    select customer_id, month2 from tablename
    UNION ALL
    select customer_id, month3 from tablename
    UNION ALL
    select customer_id, month4 from tablename
    UNION ALL
    select customer_id, month5 from tablename
)
group by customer_id, month
order by cnt
fetch first 1 row only

其中 FETCH FIRST 是 ANSI SQL，对于某些 dbms 产品可能是 TOP 或 LIMIT。

Answer 2

如果您使用数组将客户历史记录的每个月的数据集拆分为一个观察值，您可以使用 proc sql 中的汇总函数轻松获得最频繁出现的事件并使用最近一个月 (假设那是第 5 个月）打破关系。

data want1;
    set have;
    array m(*) month1 -- month5;
    do i = 1 to dim(m);
        cid = customer_id;
        frequent = m(i);
        position = i;
        output;
    end;
    keep cid frequent position;
run;

proc sql;
    create table want2 as select
    cid as customer_id,
    frequent,
    max(position) as max_pos,
    count(frequent) as count
    from want1
    group by cid, frequent;
quit;

proc sort data = want2; by customer_id descending count descending max_pos; run;

data want3;
    set want2;
    by customer_id descending count descending max_pos;
    if first.customer_id;
    drop max_pos count;
run;

return 列中出现频率最高的值

return the most frequent value within columns

sql

sas

proc-sql