如何在 Snowflake (SQL) 中添加指示重复 ID 的列？

Question

所以我有一个像这样的 table，其中每行的每个 ID 都是唯一的：

table1

 ID    data
001  Walter
002  Skylar
003    Hank
004   Marie

我有另一个 table，其中 ID 可以出现多次：

table2

ID  value
001     apple
001    banana
003     grape
004  graphite
003     jones
001      pear

我想做的就是给出这两个 tables，我想在 Table1 中添加一列，以指示 一个 ID 是否在table2

最终结果：

 ID    data  table2_multiple
001  Walter                1
002  Skylar                0
003    Hank                1
004   Marie                0

这里我们显示 ID = 1 和 ID = 3 都有 table2_multiple = 1，因为它们都在 table2!

中出现了不止一次

Answer 1

尽管这是一件很奇怪的事情，但您可以通过以下方式做到这一点：

update table1
set table2_multiple = case when t.cnt > 1 then 1 else 0 end 
from (select ID , count(*) cnt from table2 group by ID) t 
where t.id = table1.id

或者如果你只是想 select :

select t1.* , case when t2.cnt > 1 then 1 else 0 end as table2_multiple
from table1 t1 
join (select ID , count(*) cnt from table2 group by ID) t2
on t1.id = t2.id

Answer 2

在所有示例中，我们都使用 case 表达式来确定计数是否 >1 设置为 1，否则为 0。

基本聚合函数：

SELECT t1.ID, t1.Data, case when count(*) > 1 then 1 else 0 end as table2_Multiple
FROM Table1 t1 --t1 is an alias of table1
LEFT JOIN table2 t2 --t2 is an alias of table2
 ON t1.ID = t2.ID
GROUP BY T1.ID, T1.Data

使用分析函数：(Count() over (partition xxx) 这基本上表示按唯一 T1ID 和数据计算所有记录，然后表达式表示如果该计数 > 1 return 1 否则 0。 distinct 然后消除所有重复项。

SELECT Distinct t1.ID
     , t1.Data
     , case when count() over (partition by T1.ID, T1.Data) > 1 then 1 else 0 end as Table_2_multiple
LEFT JOIN Table2 T2
  on T1.ID = T2.ID

在这种情况下，使用内联视图 (T2) 获取表 2 的计数，子查询将 return 每个 ID 仅 1 行，因此无需处理多个。

SELECT T1.*, case when coalesce(t2.ValueNo,0) > 1 then 1 else 0 end as table2_Multiple 
FROM Table1
LEFT JOIN (SELECT ID, count(*) as valueNo 
           FROM Table2 
           GROUP BY ID) T2
 on T1.ID = T2.ID

如何在 Snowflake (SQL) 中添加指示重复 ID 的列？

How to add a column indicating a repeat id in Snowflake (SQL)?

sql

join

snowflake-cloud-data-platform