SQL Select 没有重复项，并有条件地选择要保留的正确副本

Question

我有一个包含产品列表的只读 table，我需要根据序列号 ('serial') 避免 selecting 重复。当我有一个副本时，我想 select 'label' 第一个字母在 A 和 J 之间的副本。

这是我的数据和我尝试获得没有重复的 selection 的尝试：

CREATE TABLE products(id INT, serial VARCHAR(25), label VARCHAR(50) , type VARCHAR(25));

INSERT INTO products(id, serial, label, type)
    VALUES
        ( 1, '111', 'A1', 'computer'),
        ( 2, '222', 'B2', 'computer'),
        ( 3, '333', 'Z3', 'computer'),
        ( 4, '333', 'D4', 'computer'),
        ( 5, '555', 'E5', 'computer'),
        ( 6, '666', 'X6', 'computer'),
        ( 7, '777', 'G7', 'computer'),
        ( 8, '777', 'Y7', 'computer'),
        ( 9, '888', 'I8', 'computer'),
        (10, '999', 'J9', 'screen'),
        (11, '777', 'G7bis', 'computer'),
        (12, '666', 'X6bis', 'computer');


SELECT COUNT(serial) OVER(PARTITION BY serial) as nbserial, *
FROM products
where type='computer' and nbserial=1 or
(nbserial>1 and LEFT(label, 1) between 'A' and 'J')
;

我有几个问题：这里我不能在where子句中定义一个关于nbserial的条件。如果有 3 个重复项，我需要 select 一行来验证条件：标签第一个字母在 A 和 J 之间。如果有多个重复项，但 none 验证条件（A 和 J 之间的第一个字母），则 select 任意行。

预期结果示例：（没有序列重复，如果可能的标签以 A 和 J 之间的字母开头）

    ( 1, '111', 'A1', 'computer'),
    ( 2, '222', 'B2', 'computer'),
    ( 4, '333', 'D4', 'computer'),
    ( 5, '555', 'E5', 'computer'),
    ( 6, '666', 'X6', 'computer'),
    ( 7, '777', 'G7', 'computer'),
    ( 9, '888', 'I8', 'computer'),
    (10, '999', 'J9', 'screen'),

如何使用 SELECT 执行此操作，并且我无法更改 table 内容？

谢谢

Answer 1

您可以使用 row_number() 和条件排序：

select *
from (
    select p.*,
        row_number() over(
            partition by serial
            order by case when left(label, 1) between 'A' and 'J' then 0 else 1 end, id
        ) rn
    from products p
) p
where rn = 1

或者更好的是，在 Postgres 中使用 distinct on：

select distinct on (serial) p.*
from products p
order by serial, (left(label, 1) between 'A' and 'J') desc, id

这为每个 serial 提供一行，并优先考虑第一个字母在“A”和“J”之间的标签。当有联系时，保留最少 id 的行。

Demo on DB Fiddle:

id | serial | label | type    
-: | :----- | :---- | :-------
 1 | 111    | A1    | computer
 2 | 222    | B2    | computer
 4 | 333    | D4    | computer
 5 | 555    | E5    | computer
 6 | 666    | X6    | computer
 7 | 777    | G7    | computer
 9 | 888    | I8    | computer
10 | 999    | J9    | screen

SQL Select 没有重复项，并有条件地选择要保留的正确副本

SQL Select without duplicates, and conditional choice of the proper duplicate to keep

sql

string

postgresql

duplicates

greatest-n-per-group