删除不区分大小写的行(Snowflake)

Dedup rows with case insensitive (Snowflake)

我想在有多个实例时删除行。

原版table:

ID 姓名
1 苹果
2 香蕉
1 苹果
2 苹果
3 香蕉

去重后的期望输出(多例时小写优先):

ID 姓名
2 香蕉
1 苹果
2 苹果
3 香蕉

ID 1“Apple”已删除,因为 ID 1“apple”存在。 ID 2“APPLE”变为“apple”,因为存在 ID 1“apple”。 ID 3“BANANA”变成了“Banana”,因为小写优先。

以下语句仅适用于按 ID 分组。因此,ID 2“APPLE”保留为“APPLE”,ID 3“BANANA”保留为“BANANA”,这是不可取的。

create table DELETE2 as select ID, max(Name) as Name
FROM TEST."PUBLIC"."DELETE1"
group by ID, lower(Name);

drop table DELETE1;
alter table DELETE2 rename to DELETE1;

怎么样:

create table DELETE2 as 
select ID, Name
from (
        select ID, lower(Name) as Name1, max(Name) as Name
        FROM TEST."PUBLIC"."DELETE1"
        group by ID, lower(Name)
     )
;

工作 SQL 您可以粘贴到 Snowflake 和 运行:

技术 ... 将所有单词变成字符数组 -> 将每个字符转换为 ascii ... 和 ascii。小写字母的 ascii 比大写字母高。

没有更新...没有功能...只是普通的旧 SQL ;-)

with cte as (
select  1 ID, 'Apple' name
union select 2 ID, 'Banana' name
union select  1 ID, 'apple' name
union select 2 ID, 'APPLE' name
union select 3 ID, 'BANANA' name ),
lu as (
select
    name,
    lower (name) lu_name,
    sum(ascii(a.value :: string)) ac,
    max(ac) over (partition by lower(name)) mac,
    iff (  max(ac) over (partition by lower(name)) = sum(ascii(a.value :: string)),name, null) g
from
    cte,
    lateral flatten(
        input => split(regexp_replace(name, '.', ',\0', 2), ',')
    ) a
group by  1,2
)
select
cte.id, lu.name
from
cte
left outer join lu on lower(cte.name) = lu.lu_name and lu.g is not null
group by  1, 2