PostgreSQL 中的多个 DISTINCT ON 子句

Multiple DISTINCT ON clauses in PostgreSQL

是否有可能 select 行是 DISTINCT ON 一些单独的、独立的列集?

假设我想要所有符合以下条件的行:

因此,在以下 table 中,标有红叉的行将不会明显(指示失败的子句):

name      birth    height
--------------------------
William    1976      1.82
James      1981      1.68
Mike       1976      1.68
Tom        1967      1.79
William    1976      1.74   ❌ (name, birth)
William    1981      1.82   ❌ (name, height)
Tom        1978      1.92
Mike       1963      1.68   ❌ (name, height)
Tom        1971      1.86
James      1981      1.77   ❌ (name, birth)
Tom        1971      1.89   ❌ (name, birth)

在上面的例子中,如果 DISTINCT ON 子句刚好是 DISTINCT ON (name, birth, height),那么所有的行都会被认为是不同的。

尝试过但没有成功:

使用派生的 table:

with my_table(name, birth, height) as (
values
('William',    1976,      1.82),
('James',      1981,      1.68),
('Mike',       1976,      1.68),
('Tom',        1967,      1.79),
('William',    1976,      1.74),  -- ? (name, birth)
('William',    1981,      1.82),  -- ? (name, height)
('Tom',        1978,      1.92),
('Mike',       1963,      1.68),  -- ? (name, height)
('Tom',        1971,      1.86),
('James',      1981,      1.77),  -- ? (name, birth)
('Tom',        1971,      1.89)   -- ? (name, birth)
)
select distinct on (name, height) *
from (
    select distinct on (name, birth) *
    from my_table
    ) s

  name   | birth | height 
---------+-------+--------
 James   |  1981 |   1.68
 Mike    |  1963 |   1.68
 Tom     |  1967 |   1.79
 Tom     |  1971 |   1.89
 Tom     |  1978 |   1.92
 William |  1976 |   1.82
(6 rows)        

, there is ambiguity in the question. The number of result rows can differ for every call. If you are satisfied with arbitrary results, 就够了。否则,您需要更紧密地定义需求。喜欢:
(name, birth) 上区分,首先选择最小的高度,然后选择最小的 ID 作为决胜局

或:
(name, height) 上区分,先选择最早的出生,然后选择最小的 ID 作为决胜局

您的 table 应该有一个主键(或 一些 唯一标识行的方法):

CREATE TEMP TABLE tbl (
  <b>tbl_id serial PRIMARY KEY</b>
, name text
, birth int
, height numeric);

INSERT INTO tbl (name, birth, height)
VALUES
  ('William', 1976, 1.82)
, ('James',   1981, 1.68)
, ('Mike',    1976, 1.68)
, ('Tom',     1967, 1.79)
, ('William', 1976, 1.74)
, ('William', 1981, 1.82)
, ('Tom',     1978, 1.92)
, ('Mike',    1963, 1.68)
, ('Tom',     1971, 1.86)
, ('James',   1981, 1.77)
, ('Tom',     1971, 1.89);

查询:

SELECT DISTINCT ON (name, height) *
FROM  (
   SELECT DISTINCT ON (name, birth) *
   FROM   tbl
   <b>ORDER  BY name, birth, height, tbl_id</b>  -- pick smallest height, ID as tiebreaker
   ) sub
<b>ORDER  BY name, height, birth, tbl_id</b>;    -- pick earliest birth, ID as tiebreaker
 tbl_id |  name   | birth | height
--------+---------+-------+--------
      2 | James   |  1981 |   1.68
      8 | Mike    |  1963 |   1.68
      4 | Tom     |  1967 |   1.79
      9 | Tom     |  1971 |   1.86
      7 | Tom     |  1978 |   1.92
      5 | William |  1976 |   1.74
      6 | William |  1981 |   1.82
(7 rows)    -- !!!

没有确定性 ORDER BYDISTINCT ON 查询可以 return 来自每组重复项的任意行。应用一次,您仍然可以获得确定的行数(任意选择)。重复应用,结果行数也是任意的。相关:

  • Select first row in each GROUP BY group?