为每对 Col1 和 Col2 找到最常出现的 Col3
Find most occurring Col3 for every pair of Col1 and Col2
给定一个有 4 列的 table myTable
,假设 Col1
、Col2
、Col3
和 Col4
:
A X 5 B
A Y 5 C
A X 7 D
A Y 3 E
A X 7 F
我需要为每对 (col1, col2)
.
找到出现次数最多的 col3
所以这个例子的结果将是:
A X 7 D/F -- D or F
A Y 5/3 C/E -- It can be 5 and C or 3 and E
所以我写了一个类似这样的查询:
select Col1,Col2,Col3
from myTable M
group by Col1,Col2,Col3
having Col3 =
(select Col3
from myTable N
where M.Col1=N.col1
group by Col3
order by Col3 desc limit 1);
但是查询没有给出想要的结果。
此外,我不知道如何获得 Col4
作为 group by 子句,我不想根据 Col4
.
进行分组
对于每个 (Col1, Col2)
对,我想要单个 Col4
与出现的最大值 Col3
。
一种方法是在聚合查询之上使用 row_number()
window 函数:
SELECT col1, col2, col3
FROM (SELECT col1, col2, col3,
ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY cnt DESC) AS rn
FROM (SELECT col1, col2, col3, COUNT(*) AS cnt
FROM mytable
GROUP BY col1, col2, col3) t
) q
WHERE rn = 1
你只需要一个带有 DISTINCT ON
:
的子查询
SELECT DISTINCT ON (col1, col2)
col1, col2, col3, min(col4) As col4
FROM tbl
GROUP BY col1, col2, col3
ORDER BY col1, col2, count(*) DESC, col3;
通过这种方式,每个 (col1, col2)
得到一个 单行 行 最常见 col3
(“最常见”的多个并列的最小值)和最小 col4
与 col3
一致。
类似地,要使 all 符合条件 col3
,您可以在子查询中使用 window function rank()
,该子查询也会在 after聚合:
SELECT col1, col2, col3, col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
ORDER BY col1, col2, col3;
这行得通,因为您可以 运行 window 函数 over 聚合函数。
如果数据类型不是 character type.
,则转换为 text
或者,列表中每个 (col1, col2)
的所有符合条件 col3
,加上第二个列表中所有匹配的 col4
:
SELECT col1, col2
, string_agg(col3::text, '/') AS col3_list -- cast if necessary
, string_agg(col4_list, '/') AS col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
GROUP BY col1, col2
ORDER BY col1, col2, col3_list;
更多解释的相关答案:
- Select first row in each GROUP BY group?
- Best way to get result count before LIMIT was applied
- Get the distinct sum of a joined table column
Amazon Redshift 解决方案
row_number()
可用,所以这应该有效:
SELECT col1, col2, col3, col4
FROM (
SELECT col1, col2, col3, min(col4) AS col4
, row_number() OVER (PARTITION BY col1, col2
ORDER BY count(*) DESC, col3) AS rn
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rn = 1
ORDER BY col1, col2;
或者如果 window 不允许聚合函数上的函数,使用另一个子查询
SELECT col1, col2, col3, col4
FROM (
SELECT *, row_number() OVER (PARTITION BY col1, col2
ORDER BY ct DESC, col3) AS rn
FROM (
SELECT col1, col2, col3, min(col4) AS col4, COUNT(*) AS ct
FROM tbl
GROUP BY col1, col2, col3
) sub1
) sub2
WHERE rn = 1;
这会选择最小的 col3
如果超过一个并列为最大计数。而最小的col4
为各自的col3
.
SQL Fiddle 在 Postgres 9.3 中演示所有内容。
给定一个有 4 列的 table myTable
,假设 Col1
、Col2
、Col3
和 Col4
:
A X 5 B
A Y 5 C
A X 7 D
A Y 3 E
A X 7 F
我需要为每对 (col1, col2)
.
col3
所以这个例子的结果将是:
A X 7 D/F -- D or F
A Y 5/3 C/E -- It can be 5 and C or 3 and E
所以我写了一个类似这样的查询:
select Col1,Col2,Col3
from myTable M
group by Col1,Col2,Col3
having Col3 =
(select Col3
from myTable N
where M.Col1=N.col1
group by Col3
order by Col3 desc limit 1);
但是查询没有给出想要的结果。
此外,我不知道如何获得 Col4
作为 group by 子句,我不想根据 Col4
.
对于每个 (Col1, Col2)
对,我想要单个 Col4
与出现的最大值 Col3
。
一种方法是在聚合查询之上使用 row_number()
window 函数:
SELECT col1, col2, col3
FROM (SELECT col1, col2, col3,
ROW_NUMBER () OVER (PARTITION BY col1, col2 ORDER BY cnt DESC) AS rn
FROM (SELECT col1, col2, col3, COUNT(*) AS cnt
FROM mytable
GROUP BY col1, col2, col3) t
) q
WHERE rn = 1
你只需要一个带有 DISTINCT ON
:
SELECT DISTINCT ON (col1, col2)
col1, col2, col3, min(col4) As col4
FROM tbl
GROUP BY col1, col2, col3
ORDER BY col1, col2, count(*) DESC, col3;
通过这种方式,每个 (col1, col2)
得到一个 单行 行 最常见 col3
(“最常见”的多个并列的最小值)和最小 col4
与 col3
一致。
类似地,要使 all 符合条件 col3
,您可以在子查询中使用 window function rank()
,该子查询也会在 after聚合:
SELECT col1, col2, col3, col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
ORDER BY col1, col2, col3;
这行得通,因为您可以 运行 window 函数 over 聚合函数。
如果数据类型不是 character type.
text
或者,列表中每个 (col1, col2)
的所有符合条件 col3
,加上第二个列表中所有匹配的 col4
:
SELECT col1, col2
, string_agg(col3::text, '/') AS col3_list -- cast if necessary
, string_agg(col4_list, '/') AS col4_list
FROM (
SELECT col1, col2, col3, count(*) AS ct, string_agg(col4, '/') AS col4_list
, rank() OVER (PARTITION BY col1, col2 ORDER BY count(*) DESC) AS rnk
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rnk = 1
GROUP BY col1, col2
ORDER BY col1, col2, col3_list;
更多解释的相关答案:
- Select first row in each GROUP BY group?
- Best way to get result count before LIMIT was applied
- Get the distinct sum of a joined table column
Amazon Redshift 解决方案
row_number()
可用,所以这应该有效:
SELECT col1, col2, col3, col4
FROM (
SELECT col1, col2, col3, min(col4) AS col4
, row_number() OVER (PARTITION BY col1, col2
ORDER BY count(*) DESC, col3) AS rn
FROM tbl
GROUP BY col1, col2, col3
) sub
WHERE rn = 1
ORDER BY col1, col2;
或者如果 window 不允许聚合函数上的函数,使用另一个子查询
SELECT col1, col2, col3, col4
FROM (
SELECT *, row_number() OVER (PARTITION BY col1, col2
ORDER BY ct DESC, col3) AS rn
FROM (
SELECT col1, col2, col3, min(col4) AS col4, COUNT(*) AS ct
FROM tbl
GROUP BY col1, col2, col3
) sub1
) sub2
WHERE rn = 1;
这会选择最小的 col3
如果超过一个并列为最大计数。而最小的col4
为各自的col3
.
SQL Fiddle 在 Postgres 9.3 中演示所有内容。