获取每列最大计数的行 - 同时按两列分组
Get rows with maximum count per one column - while grouping by two columns
我正在尝试获取字段的最大计数。
这就是我得到的,也是我试图做的。
| col1 | col2 |
| A | B |
| A | B |
| A | D |
| A | D |
| A | D |
| C | F |
| C | G |
| C | F |
我正在尝试获取 col2
的最大出现次数,按 col1
分组。
通过此查询,我得到了按 col1
和 col2
分组的事件。
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP by col1, col2
ORDER BY col1, col2
然后我得到:
| col1 | col2 | conta |
| A | B | 2 |
| A | D | 3 |
| C | F | 2 |
| C | G | 1 |
然后我使用这个查询来获取最大计数:
SELECT max(conta) as conta2, col1
FROM (
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP BY col1, col2
ORDER BY col1, col2
) AS derivedTable
GROUP BY col1
然后我得到:
| col1 | conta |
| A | 3 |
| C | 2 |
我缺少的是 col2
的值。我想要这样的东西:
| col1 | col2 | conta |
| A | D | 3 |
| C | F | 2 |
问题是,如果我尝试 select col2
字段,我会收到一条错误消息,我必须在分组依据或聚合函数中使用此字段,但在分组方式不对
我误解了问题。这是您的解决方案:
;with tablex as
(Select col1, col2, Count(col2) as Count From Your_Table Group by col1, col2),
aaaa as
(Select ROW_NUMBER() over (partition by col1 order by Count desc) as row, * From tablex)
Select * From aaaa Where row = 1
可能不是最优雅的解决方案,但使用常见的 table 表达式可能会有所帮助。
with cte as (
select col1, col2, count(*) as total
from dtable
group by col1, col2
)
select col1, col2, total
from cte c
where total = (select max(total)
from cte cc
where cc.col1 = c.col1)
order by col1 asc
Returns
col1|col2|total|
----+----+-----+
A | D | 3|
C | F | 2|
您可以将 GROUP BY 与 window 函数结合使用 - 它会在 分组依据后进行评估:
with cte as (
SELECT col1, col2,
count(*) as conta,
dense_rank() over (partition by col1 order by count(*) desc) as rnk
FROM tab
WHERE ...
GROUP by col1, col2
)
select col1, col2, conta
from cte
where rnk = 1
order by col1, col2;
这将return具有相同最高最大值的 col1,col2 的组合计数两次。如果您不想那样,请使用 row_number()
而不是 dense_rank()
使用 window 函数:
select distinct on (col1) col1, col2, cnt
from
(
select col1, col2, count(*) over (partition by col1, col2) cnt
from the_table
) t
order by col1, cnt desc;
col1
col2
cnt
A
D
3
C
F
2
此解决方案不解决有关系的案例。
更简单、更快(并且正确):
SELECT DISTINCT ON (col1)
col1, col2, count(*) AS conta
FROM tab
GROUP BY col1, col2
ORDER BY col1, conta DESC;
db<>fiddle here(基于a_horse的fiddle)
DISTINCT ON
应用 after 聚合,因此我们不需要子查询或 CTE。考虑 SELECT
查询中的事件序列:
- Best way to get result count before LIMIT was applied
- Select first row in each GROUP BY group?
我正在尝试获取字段的最大计数。 这就是我得到的,也是我试图做的。
| col1 | col2 |
| A | B |
| A | B |
| A | D |
| A | D |
| A | D |
| C | F |
| C | G |
| C | F |
我正在尝试获取 col2
的最大出现次数,按 col1
分组。
通过此查询,我得到了按 col1
和 col2
分组的事件。
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP by col1, col2
ORDER BY col1, col2
然后我得到:
| col1 | col2 | conta |
| A | B | 2 |
| A | D | 3 |
| C | F | 2 |
| C | G | 1 |
然后我使用这个查询来获取最大计数:
SELECT max(conta) as conta2, col1
FROM (
SELECT col1, col2, count(*) as conta
FROM tab
WHERE
GROUP BY col1, col2
ORDER BY col1, col2
) AS derivedTable
GROUP BY col1
然后我得到:
| col1 | conta |
| A | 3 |
| C | 2 |
我缺少的是 col2
的值。我想要这样的东西:
| col1 | col2 | conta |
| A | D | 3 |
| C | F | 2 |
问题是,如果我尝试 select col2
字段,我会收到一条错误消息,我必须在分组依据或聚合函数中使用此字段,但在分组方式不对
我误解了问题。这是您的解决方案:
;with tablex as
(Select col1, col2, Count(col2) as Count From Your_Table Group by col1, col2),
aaaa as
(Select ROW_NUMBER() over (partition by col1 order by Count desc) as row, * From tablex)
Select * From aaaa Where row = 1
可能不是最优雅的解决方案,但使用常见的 table 表达式可能会有所帮助。
with cte as (
select col1, col2, count(*) as total
from dtable
group by col1, col2
)
select col1, col2, total
from cte c
where total = (select max(total)
from cte cc
where cc.col1 = c.col1)
order by col1 asc
Returns
col1|col2|total|
----+----+-----+
A | D | 3|
C | F | 2|
您可以将 GROUP BY 与 window 函数结合使用 - 它会在 分组依据后进行评估:
with cte as (
SELECT col1, col2,
count(*) as conta,
dense_rank() over (partition by col1 order by count(*) desc) as rnk
FROM tab
WHERE ...
GROUP by col1, col2
)
select col1, col2, conta
from cte
where rnk = 1
order by col1, col2;
这将return具有相同最高最大值的 col1,col2 的组合计数两次。如果您不想那样,请使用 row_number()
而不是 dense_rank()
使用 window 函数:
select distinct on (col1) col1, col2, cnt
from
(
select col1, col2, count(*) over (partition by col1, col2) cnt
from the_table
) t
order by col1, cnt desc;
col1 | col2 | cnt |
---|---|---|
A | D | 3 |
C | F | 2 |
此解决方案不解决有关系的案例。
更简单、更快(并且正确):
SELECT DISTINCT ON (col1)
col1, col2, count(*) AS conta
FROM tab
GROUP BY col1, col2
ORDER BY col1, conta DESC;
db<>fiddle here(基于a_horse的fiddle)
DISTINCT ON
应用 after 聚合,因此我们不需要子查询或 CTE。考虑 SELECT
查询中的事件序列:
- Best way to get result count before LIMIT was applied
- Select first row in each GROUP BY group?