SQL - 删除具有空值的重复项 WITHOUT row_number
SQL - delete duplicates with null values WITHOUT row_number
我正在使用 SQLite,我有以下 table, x,其中包含大约 300k 行,这是一个示例:
name
surname
nickname
Jeniffer
Doe
Jenny
Jeniffer
Doe
NULL
Jeniffer
Doe
Jenny
每行都有 Name、Surname 和 Nickname 值,其中 Nickname 也可以为 NULL。 None 个值是唯一的,也没有键。我想要做的是删除 Nickname 为 NULL 的“重复项”,同时按 Name-Surname 对对它们进行分组,同时删除“实际”重复项。
这将是预期的输出:
name
surname
nickname
Jeniffer
Doe
Jenny
我试过使用以下查询:
select x.*
from (select x.*,
row_number() over (partition by name, surname order by nickname nulls last) as seqnum
from x)
) x
where seqnum = 1;
但不幸的是我得到一个错误:
near "(": syntax error
我已经搜索了几天来寻找答案,但我仍然无法弄清楚。如果重要的话,我正在使用 SQLite3。我仍然不确定为什么会出现语法错误,我认为它与 row_number()
有关,但根据我在网上的搜索,sqlite3 应该支持它(在我的情况下它不是那么远据我了解)
所以现在我正在寻找一种方法来修改此查询以获得所需的输出,但我现在好几天都一无所知...
如有任何帮助,我们将不胜感激!
对于 3.15,您可以尝试这样的操作,它可以避免 window 函数、NULLS LAST 等:
SELECT DISTINCT x.*
FROM x
WHERE NOT EXISTS (
SELECT 1 FROM x AS x1
WHERE (x.name, x.surname) = (x1.name, x1.surname)
AND COALESCE(x.nickname, x1.nickname || 'x')
> COALESCE(x1.nickname, x.nickname || 'x')
)
;
结果:
name
surname
nickname
Jeniffer
Doe
Jenny
对于较新版本的 sqlite:
order by nickname IS NULL, nickname
解决了 NULLS LAST
要求,因为 nickname IS NULL
为假产生 0,为真产生 1,所以 null 情况排在最后。
select x.*
from (
select x.*
, row_number() over (partition by name, surname order by nickname IS NULL, nickname) as seqnum
from x
) AS x
where seqnum = 1
;
测试用例:
WITH x (name, surname, nickname) AS (
SELECT 'Jeniffer', 'Doe', 'Jenny' UNION ALL
SELECT 'Jeniffer', 'Doe', NULL UNION ALL
SELECT 'Jeniffer', 'Doe', 'Jenny'
)
select x.*
from (
select x.*
, row_number() over (partition by name, surname order by nickname is null, nickname) as seqnum
from x
) AS x
where seqnum = 1
;
结果:
name
surname
nickname
seqnum
Jeniffer
Doe
Jenny
1
除非您在 table 中创建了 table WITHOUT ROWID, there is a column rowid
作为主键。
您可以使用聚合来获取不应删除的 name
和 surname
的每个组合的最小值 rowid
:
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT COALESCE(
MIN(CASE WHEN nickname IS NOT NULL THEN rowid END),
MIN(rowid)
)
FROM tablename
GROUP BY name, surname
);
参见demo。
我正在使用 SQLite,我有以下 table, x,其中包含大约 300k 行,这是一个示例:
name | surname | nickname |
---|---|---|
Jeniffer | Doe | Jenny |
Jeniffer | Doe | NULL |
Jeniffer | Doe | Jenny |
每行都有 Name、Surname 和 Nickname 值,其中 Nickname 也可以为 NULL。 None 个值是唯一的,也没有键。我想要做的是删除 Nickname 为 NULL 的“重复项”,同时按 Name-Surname 对对它们进行分组,同时删除“实际”重复项。
这将是预期的输出:
name | surname | nickname |
---|---|---|
Jeniffer | Doe | Jenny |
我试过使用以下查询:
select x.*
from (select x.*,
row_number() over (partition by name, surname order by nickname nulls last) as seqnum
from x)
) x
where seqnum = 1;
但不幸的是我得到一个错误:
near "(": syntax error
我已经搜索了几天来寻找答案,但我仍然无法弄清楚。如果重要的话,我正在使用 SQLite3。我仍然不确定为什么会出现语法错误,我认为它与 row_number()
有关,但根据我在网上的搜索,sqlite3 应该支持它(在我的情况下它不是那么远据我了解)
所以现在我正在寻找一种方法来修改此查询以获得所需的输出,但我现在好几天都一无所知...
如有任何帮助,我们将不胜感激!
对于 3.15,您可以尝试这样的操作,它可以避免 window 函数、NULLS LAST 等:
SELECT DISTINCT x.*
FROM x
WHERE NOT EXISTS (
SELECT 1 FROM x AS x1
WHERE (x.name, x.surname) = (x1.name, x1.surname)
AND COALESCE(x.nickname, x1.nickname || 'x')
> COALESCE(x1.nickname, x.nickname || 'x')
)
;
结果:
name | surname | nickname |
---|---|---|
Jeniffer | Doe | Jenny |
对于较新版本的 sqlite:
order by nickname IS NULL, nickname
解决了 NULLS LAST
要求,因为 nickname IS NULL
为假产生 0,为真产生 1,所以 null 情况排在最后。
select x.*
from (
select x.*
, row_number() over (partition by name, surname order by nickname IS NULL, nickname) as seqnum
from x
) AS x
where seqnum = 1
;
测试用例:
WITH x (name, surname, nickname) AS (
SELECT 'Jeniffer', 'Doe', 'Jenny' UNION ALL
SELECT 'Jeniffer', 'Doe', NULL UNION ALL
SELECT 'Jeniffer', 'Doe', 'Jenny'
)
select x.*
from (
select x.*
, row_number() over (partition by name, surname order by nickname is null, nickname) as seqnum
from x
) AS x
where seqnum = 1
;
结果:
name | surname | nickname | seqnum |
---|---|---|---|
Jeniffer | Doe | Jenny | 1 |
除非您在 table 中创建了 table WITHOUT ROWID, there is a column rowid
作为主键。
您可以使用聚合来获取不应删除的 name
和 surname
的每个组合的最小值 rowid
:
DELETE FROM tablename
WHERE rowid NOT IN (
SELECT COALESCE(
MIN(CASE WHEN nickname IS NOT NULL THEN rowid END),
MIN(rowid)
)
FROM tablename
GROUP BY name, surname
);
参见demo。