在 table 列中查找每组中出现频率最高的值

Question

我需要为每个种族找到 object_of_search 的最常见值。我怎样才能做到这一点？ SELECT 子句中的子查询和相关子查询是不允许的。类似于此：

mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"

但这并没有汇总，每个种族都有很多行，object_of_search:

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms
 ethnicity3                |                 2 |              100 | Firearms
 ethnicity1                |                 5 |               60 | Cat
 ethnicity1                |                 5 |               60 | Dog
 ethnicity2                |                 3 | 66.6666666666667 | Firearms
 ethnicity1                |                 5 |               60 | Psychoactive substances
 ethnicity1                |                 5 |               60 | Fireworks

应该是这样的：

 officer_defined_ethnicity | Sas for ethnicity |   Arrest rate    | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
 ethnicity2                |                 3 | 66.6666666666667 | Stolen goods
 ethnicity3                |                 2 |              100 | Fireworks
 ethnicity1                |                 5 |               60 | Firearms

Table fiddle。
查询：

SELECT DISTINCT
    stopAndSearches.officer_defined_ethnicity,
    count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
    sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
       OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
       count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
    mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;

Table:

CREATE TABLE IF NOT EXISTS stopAndSearches(
    "sas_id" bigserial PRIMARY KEY,
    "officer_defined_ethnicity" VARCHAR(255),
    "object_of_search" VARCHAR(255),
    "outcome" VARCHAR(255)
);

Answer 1

更新：Fiddle

这应该解决具体的“每个种族的对象”问题。

请注意，这并未解决计数中的关系问题。那不是问题/请求的一部分。

调整您的 SQL 以包含此逻辑，以提供详细信息：

WITH cte AS (
        SELECT officer_defined_ethnicity
             , object_of_search
             , COUNT(*) AS n
             , ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
          FROM stopAndSearches
         GROUP BY officer_defined_ethnicity, object_of_search
     )
SELECT * FROM cte
 WHERE rn = 1
;

结果：

officer_defined_ethnicity	object_of_search	n	rn
ethnicity1	Cat	1	1
ethnicity2	Stolen goods	2	1
ethnicity3	Fireworks	1	1

Answer 2

SELECT DISTINCT ON (1)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY 1, 2
ORDER  BY 1, 3 DESC, 2;

或更明确地说：

SELECT DISTINCT ON (officer_defined_ethnicity)
       officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM   stop_and_searches
GROUP  BY officer_defined_ethnicity, object_of_search
ORDER  BY officer_defined_ethnicity, ct DESC, object_of_search;

 officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
 ethnicity1                | Cat              | 1
 ethnicity2                | Stolen goods     | 2
 ethnicity3                | Firearms         | 1

db<>fiddle here

因为 DISTINCT ON 应用 after GROUP BY 我们只需要一个查询级别。

聚合以获得每个 (officer_defined_ethnicity, object_of_search) 与 GROUP BY 的计数。
用 DISTINCT ON 选择每个 officer_defined_ethnicity 计数最高的行。

我添加了 object_of_search 作为第三个 ORDER BY 项来充当决胜局并产生确定性结果：
如果出现平局，请根据字母顺序选择第一个 object_of_search。
适应您的需求。

参见：

Select first row in each GROUP BY group?
Best way to get result count before LIMIT was applied

比 row_number():

的子查询更简单且通常更快

Select first row in each GROUP BY group? - Benchmarks

在 table 列中查找每组中出现频率最高的值

Find the most frequent value per group in a table column

sql

postgresql

greatest-n-per-group