在 table 列中查找每组中出现频率最高的值
Find the most frequent value per group in a table column
我需要为每个种族找到 object_of_search
的最常见值。我怎样才能做到这一点? SELECT
子句中的子查询和相关子查询是不允许的。类似于此:
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
但这并没有汇总,每个种族都有很多行,object_of_search:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
ethnicity3 | 2 | 100 | Firearms
ethnicity1 | 5 | 60 | Cat
ethnicity1 | 5 | 60 | Dog
ethnicity2 | 3 | 66.6666666666667 | Firearms
ethnicity1 | 5 | 60 | Psychoactive substances
ethnicity1 | 5 | 60 | Fireworks
应该是这样的:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
Table fiddle。
查询:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;
Table:
CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);
更新:Fiddle
这应该解决具体的“每个种族的对象”问题。
请注意,这并未解决计数中的关系问题。那不是问题/请求的一部分。
调整您的 SQL 以包含此逻辑,以提供详细信息:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
结果:
officer_defined_ethnicity
object_of_search
n
rn
ethnicity1
Cat
1
1
ethnicity2
Stolen goods
2
1
ethnicity3
Fireworks
1
1
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY 1, 2
ORDER BY 1, 3 DESC, 2;
或更明确地说:
SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY officer_defined_ethnicity, object_of_search
ORDER BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1 | Cat | 1
ethnicity2 | Stolen goods | 2
ethnicity3 | Firearms | 1
db<>fiddle here
因为 DISTINCT ON
应用 after GROUP BY
我们只需要一个查询级别。
- 聚合以获得每个
(officer_defined_ethnicity, object_of_search)
与 GROUP BY
的计数。
- 用
DISTINCT ON
选择每个 officer_defined_ethnicity
计数最高的行。
我添加了 object_of_search
作为第三个 ORDER BY
项来充当决胜局并产生确定性结果:
如果出现平局,请根据字母顺序选择第一个 object_of_search
。
适应您的需求。
参见:
- Select first row in each GROUP BY group?
- Best way to get result count before LIMIT was applied
比 row_number()
:
的子查询更简单且通常更快
- Select first row in each GROUP BY group? - Benchmarks
我需要为每个种族找到 object_of_search
的最常见值。我怎样才能做到这一点? SELECT
子句中的子查询和相关子查询是不允许的。类似于此:
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
但这并没有汇总,每个种族都有很多行,object_of_search:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
ethnicity3 | 2 | 100 | Firearms
ethnicity1 | 5 | 60 | Cat
ethnicity1 | 5 | 60 | Dog
ethnicity2 | 3 | 66.6666666666667 | Firearms
ethnicity1 | 5 | 60 | Psychoactive substances
ethnicity1 | 5 | 60 | Fireworks
应该是这样的:
officer_defined_ethnicity | Sas for ethnicity | Arrest rate | Most frequent object of search
---------------------------+-------------------+------------------+--------------------------------
ethnicity2 | 3 | 66.6666666666667 | Stolen goods
ethnicity3 | 2 | 100 | Fireworks
ethnicity1 | 5 | 60 | Firearms
Table fiddle。
查询:
SELECT DISTINCT
stopAndSearches.officer_defined_ethnicity,
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity) AS "Sas for ethnicity",
sum(case when stopAndSearches.outcome = 'Arrest' then 1 else 0 end)
OVER (PARTITION BY stopAndSearches.officer_defined_ethnicity)::float /
count(stopAndSearches.sas_id) OVER(PARTITION BY stopAndSearches.officer_defined_ethnicity)::float * 100 AS "Arrest rate",
mode() WITHIN GROUP (ORDER BY stopAndSearches.object_of_search) AS "Most frequent object of search"
FROM stopAndSearches
GROUP BY stopAndSearches.sas_id, stopAndSearches.officer_defined_ethnicity;
Table:
CREATE TABLE IF NOT EXISTS stopAndSearches(
"sas_id" bigserial PRIMARY KEY,
"officer_defined_ethnicity" VARCHAR(255),
"object_of_search" VARCHAR(255),
"outcome" VARCHAR(255)
);
更新:Fiddle
这应该解决具体的“每个种族的对象”问题。
请注意,这并未解决计数中的关系问题。那不是问题/请求的一部分。
调整您的 SQL 以包含此逻辑,以提供详细信息:
WITH cte AS (
SELECT officer_defined_ethnicity
, object_of_search
, COUNT(*) AS n
, ROW_NUMBER() OVER (PARTITION BY officer_defined_ethnicity ORDER BY COUNT(*) DESC) AS rn
FROM stopAndSearches
GROUP BY officer_defined_ethnicity, object_of_search
)
SELECT * FROM cte
WHERE rn = 1
;
结果:
officer_defined_ethnicity | object_of_search | n | rn |
---|---|---|---|
ethnicity1 | Cat | 1 | 1 |
ethnicity2 | Stolen goods | 2 | 1 |
ethnicity3 | Fireworks | 1 | 1 |
SELECT DISTINCT ON (1)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY 1, 2
ORDER BY 1, 3 DESC, 2;
或更明确地说:
SELECT DISTINCT ON (officer_defined_ethnicity)
officer_defined_ethnicity, object_of_search, count(*) AS ct
FROM stop_and_searches
GROUP BY officer_defined_ethnicity, object_of_search
ORDER BY officer_defined_ethnicity, ct DESC, object_of_search;
officer_defined_ethnicity | object_of_search | ct
---------------------------+------------------+----
ethnicity1 | Cat | 1
ethnicity2 | Stolen goods | 2
ethnicity3 | Firearms | 1
db<>fiddle here
因为 DISTINCT ON
应用 after GROUP BY
我们只需要一个查询级别。
- 聚合以获得每个
(officer_defined_ethnicity, object_of_search)
与GROUP BY
的计数。 - 用
DISTINCT ON
选择每个officer_defined_ethnicity
计数最高的行。
我添加了 object_of_search
作为第三个 ORDER BY
项来充当决胜局并产生确定性结果:
如果出现平局,请根据字母顺序选择第一个 object_of_search
。
适应您的需求。
参见:
- Select first row in each GROUP BY group?
- Best way to get result count before LIMIT was applied
比 row_number()
:
- Select first row in each GROUP BY group? - Benchmarks