SQL 每组的不同计数除以总的不同计数

SQL count distinct per group divided by count distinct of total

我有:

id value
1 123
1 124
1 125
2 126
2 127
2 127
3 128
3 128
3 128

我想要这样的聚合:

id distinct_count total_distinct percentage
1 3 6 0.5
2 2 6 0.33
3 1 6 0.167

我尝试像这样应用 window over 子句:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       COUNT(DISTINCT value) OVER () AS total_distinct,
       COUNT(DISTINCT value) / COUNT(DISTINCT value) OVER () AS percentage
FROM have
GROUP BY id

不过好像还没有实现。

有没有办法在没有连接的情况下实现这一点?

你可以这样做:

SELECT id,
       COUNT(DISTINCT value) AS distinct_count,
       (SELECT COUNT(DISTINCT value) FROM have) AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / (SELECT COUNT(DISTINCT value) FROM have) AS percentage
FROM have
GROUP BY id

或者做:

WITH cte AS (SELECT COUNT(DISTINCT value) AS value FROM have)
SELECT 
       id,
       COUNT(DISTINCT value) AS distinct_count,
       cte.value AS total_distinct,
       (0.0+COUNT(DISTINCT value)) / cte.value AS percentage
FROM have
CROSS APPLY cte
GROUP By cte.value,id;

另一种方法是枚举值并使用条件聚合:

SELECT id,
       SUM(CASE WHEN seqnum_iv = 1 THEN 1 ELSE 0 END) as distinct_count,
       SUM(CASE WHEN seqnum_v = 1 THEN 1 ELSE 0 END) as total_distinct_count,
       (SUM(CASE WHEN seqnum_iv = 1 THEN 1.0 ELSE 0 END) /
        SUM(CASE WHEN seqnum_v = 1 THEN 1.0 ELSE 0 END)
       ) as ratio
FROM (SELECT h.*,
             ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY value) as seqnum_iv,
             ROW_NUMBER() OVER (PARTITION BY value ORDER BY value) as seqnum_v
      FROM have h
     ) h
GROUP BY id;

这可能比使用子查询的方法更快。