SQL 每组的不同计数除以总的不同计数
SQL count distinct per group divided by count distinct of total
我有:
id
value
1
123
1
124
1
125
2
126
2
127
2
127
3
128
3
128
3
128
我想要这样的聚合:
id
distinct_count
total_distinct
percentage
1
3
6
0.5
2
2
6
0.33
3
1
6
0.167
我尝试像这样应用 window over 子句:
SELECT id,
COUNT(DISTINCT value) AS distinct_count,
COUNT(DISTINCT value) OVER () AS total_distinct,
COUNT(DISTINCT value) / COUNT(DISTINCT value) OVER () AS percentage
FROM have
GROUP BY id
不过好像还没有实现。
有没有办法在没有连接的情况下实现这一点?
你可以这样做:
SELECT id,
COUNT(DISTINCT value) AS distinct_count,
(SELECT COUNT(DISTINCT value) FROM have) AS total_distinct,
(0.0+COUNT(DISTINCT value)) / (SELECT COUNT(DISTINCT value) FROM have) AS percentage
FROM have
GROUP BY id
或者做:
WITH cte AS (SELECT COUNT(DISTINCT value) AS value FROM have)
SELECT
id,
COUNT(DISTINCT value) AS distinct_count,
cte.value AS total_distinct,
(0.0+COUNT(DISTINCT value)) / cte.value AS percentage
FROM have
CROSS APPLY cte
GROUP By cte.value,id;
另一种方法是枚举值并使用条件聚合:
SELECT id,
SUM(CASE WHEN seqnum_iv = 1 THEN 1 ELSE 0 END) as distinct_count,
SUM(CASE WHEN seqnum_v = 1 THEN 1 ELSE 0 END) as total_distinct_count,
(SUM(CASE WHEN seqnum_iv = 1 THEN 1.0 ELSE 0 END) /
SUM(CASE WHEN seqnum_v = 1 THEN 1.0 ELSE 0 END)
) as ratio
FROM (SELECT h.*,
ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY value) as seqnum_iv,
ROW_NUMBER() OVER (PARTITION BY value ORDER BY value) as seqnum_v
FROM have h
) h
GROUP BY id;
这可能比使用子查询的方法更快。
我有:
id | value |
---|---|
1 | 123 |
1 | 124 |
1 | 125 |
2 | 126 |
2 | 127 |
2 | 127 |
3 | 128 |
3 | 128 |
3 | 128 |
我想要这样的聚合:
id | distinct_count | total_distinct | percentage |
---|---|---|---|
1 | 3 | 6 | 0.5 |
2 | 2 | 6 | 0.33 |
3 | 1 | 6 | 0.167 |
我尝试像这样应用 window over 子句:
SELECT id,
COUNT(DISTINCT value) AS distinct_count,
COUNT(DISTINCT value) OVER () AS total_distinct,
COUNT(DISTINCT value) / COUNT(DISTINCT value) OVER () AS percentage
FROM have
GROUP BY id
不过好像还没有实现。
有没有办法在没有连接的情况下实现这一点?
你可以这样做:
SELECT id,
COUNT(DISTINCT value) AS distinct_count,
(SELECT COUNT(DISTINCT value) FROM have) AS total_distinct,
(0.0+COUNT(DISTINCT value)) / (SELECT COUNT(DISTINCT value) FROM have) AS percentage
FROM have
GROUP BY id
或者做:
WITH cte AS (SELECT COUNT(DISTINCT value) AS value FROM have)
SELECT
id,
COUNT(DISTINCT value) AS distinct_count,
cte.value AS total_distinct,
(0.0+COUNT(DISTINCT value)) / cte.value AS percentage
FROM have
CROSS APPLY cte
GROUP By cte.value,id;
另一种方法是枚举值并使用条件聚合:
SELECT id,
SUM(CASE WHEN seqnum_iv = 1 THEN 1 ELSE 0 END) as distinct_count,
SUM(CASE WHEN seqnum_v = 1 THEN 1 ELSE 0 END) as total_distinct_count,
(SUM(CASE WHEN seqnum_iv = 1 THEN 1.0 ELSE 0 END) /
SUM(CASE WHEN seqnum_v = 1 THEN 1.0 ELSE 0 END)
) as ratio
FROM (SELECT h.*,
ROW_NUMBER() OVER (PARTITION BY id, value ORDER BY value) as seqnum_iv,
ROW_NUMBER() OVER (PARTITION BY value ORDER BY value) as seqnum_v
FROM have h
) h
GROUP BY id;
这可能比使用子查询的方法更快。