SQL - 删除具有空值的重复项 WITHOUT row_number

SQL - delete duplicates with null values WITHOUT row_number

我正在使用 SQLite,我有以下 table, x,其中包含大约 300k 行,这是一个示例:

name surname nickname
Jeniffer Doe Jenny
Jeniffer Doe NULL
Jeniffer Doe Jenny

每行都有 Name、Surname 和 Nickname 值,其中 Nickname 也可以为 NULL。 None 个值是唯一的,也没有键。我想要做的是删除 Nickname 为 NULL 的“重复项”,同时按 Name-Surname 对对它们进行分组,同时删除“实际”重复项。

这将是预期的输出:

name surname nickname
Jeniffer Doe Jenny

我试过使用以下查询:

select x.* 
from (select x.*, 
          row_number() over (partition by name, surname order by nickname nulls last) as seqnum
     from x)
     ) x
where seqnum = 1;

但不幸的是我得到一个错误:

near "(": syntax error

我已经搜索了几天来寻找答案,但我仍然无法弄清楚。如果重要的话,我正在使用 SQLite3。我仍然不确定为什么会出现语法错误,我认为它与 row_number() 有关,但根据我在网上的搜索,sqlite3 应该支持它(在我的情况下它不是那么远据我了解)

所以现在我正在寻找一种方法来修改此查询以获得所需的输出,但我现在好几天都一无所知...

如有任何帮助,我们将不胜感激!

对于 3.15,您可以尝试这样的操作,它可以避免 window 函数、NULLS LAST 等:

The fiddle

SELECT DISTINCT x.*
  FROM x
 WHERE NOT EXISTS (
           SELECT 1 FROM x AS x1
            WHERE (x.name,  x.surname) = (x1.name,  x1.surname)
              AND COALESCE(x.nickname, x1.nickname || 'x')
                > COALESCE(x1.nickname, x.nickname || 'x')
       )
;

结果:

name surname nickname
Jeniffer Doe Jenny

对于较新版本的 sqlite:

order by nickname IS NULL, nickname 解决了 NULLS LAST 要求,因为 nickname IS NULL 为假产生 0,为真产生 1,所以 null 情况排在最后。

select x.* 
  from (
         select x.*
              , row_number() over (partition by name, surname order by nickname IS NULL, nickname) as seqnum
           from x
       ) AS x
 where seqnum = 1
;

测试用例:

WITH x (name, surname, nickname) AS (
        SELECT 'Jeniffer',  'Doe',         'Jenny' UNION ALL
        SELECT 'Jeniffer',  'Doe',         NULL    UNION ALL
        SELECT 'Jeniffer',  'Doe',         'Jenny'
     )
select x.* 
  from (
         select x.*
              , row_number() over (partition by name, surname order by nickname is null, nickname) as seqnum
           from x
       ) AS x
 where seqnum = 1
;

结果:

name surname nickname seqnum
Jeniffer Doe Jenny 1

除非您在 table 中创建了 table WITHOUT ROWID, there is a column rowid 作为主键。

您可以使用聚合来获取不应删除的 namesurname 的每个组合的最小值 rowid

DELETE FROM tablename
WHERE rowid NOT IN (
  SELECT COALESCE(
           MIN(CASE WHEN nickname IS NOT NULL THEN rowid END),
           MIN(rowid)
         )  
  FROM tablename 
  GROUP BY name, surname
);

参见demo