获取每个用户 ID 最常出现的值

Get most commonly occurring value for each user id

我有一个 table,其中 userIds 和产品类别 prod。我想获得一个 table 的唯一 userIds 和相关的最常出现的产品类别 prod。换句话说,我想知道每位顾客购买最多的商品类别。如何在 PL/SQL 或 Oracle SQL 中实现此目的?

|userId|prod|
|------|----|
|123544|cars|
|123544|cars|
|123544|dogs|
|123544|cats|
|987689|bats|
|987689|cats|

我已经看到关于获取列的最常见值的 SO 问题,但是如何获取每个唯一值的最常见值 userId

你应该只使用 SQL 来解决这个问题..如果你在 pl/sql 中真的需要它,只需将这个查询嵌入到 plsql 中..

(设置)

  drop table yourtable;
  create table yourtable (
     userID   number,
     prod     varchar2(10)
     )
  /

  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'cars' );
  insert into yourtable values ( 123544, 'dogs' );
  insert into yourtable values ( 123544, 'cats' );
  insert into yourtable values ( 987689, 'bats' );
  insert into yourtable values ( 987689, 'cats' );

  commit;

-- 假设关系没有中断,这个逻辑 return 两个关系

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               rank() over (partition by userID order by rgrp desc) rnk,
          from w_grp
        )
  select distinct userID, prod
    from w_rnk
   where rnk = 1
  /

      USERID PROD
  ---------- ----------
      987689 bats
      987689 cats
      123544 cars

-- 假设你只想要 1 .. 这将是 return 1 个随机的,如果它们是并列的。(即这次它拉了 987689 个蝙蝠,下一次它可能会拉 987689 只猫。它总是 return 123544 辆车,但是,因为没有平局。

  with w_grp as (
        select userID, prod, count(*) over ( partition by userID, prod ) rgrp
          from yourtable
        ),
     w_rnk as (
        select userID, prod, rgrp,
               row_number() over (partition by userID order by rgrp desc) rnum
          from w_grp
        )
  select userID, prod, rnum
    from w_rnk
   where rnum = 1
  /

      USERID PROD             RNUM
  ---------- ---------- ----------
      123544 cars                1
      987689 bats                1

[edit] 从函数中清除未使用的 rank/row_number 以避免混淆 [/edit]

SELECT user_id, prod, prod_cnt FROM (
    SELECT user_id, prod, prod_cnt
         , RANK() OVER ( PARTITION BY user_id ORDER BY prod_cnt DESC ) AS rn
      FROM (
        SELECT user_id, prod, COUNT(*) AS prod_cnt
          FROM mytable
         GROUP BY user_id, prod
    )
) WHERE rn = 1;

在最里面的子查询中,我按用户获取每个产品的 COUNT。然后我使用分析 (window) 函数 RANK() 对它们进行排名。然后我简单地 select 所有 RANK 等于 1 的那些。使用 RANK() 而不是 ROW_NUMBER() 确保关系将被返回。