在 Snowflake 和 SQL 中合并完整和不完整的数据帧
Unioning complete and incomplete dataframes in Snowflake and SQL
所以我有一个看起来像这样的 df
DF3
ID field1 field2 field3
001 banana 1 y
001 apple 1 y
004 orange 21 n
005 orange 32 y
现在这个 table 是 DF3,它是 Df2 的未来状态,如下所示:
DF2
ID field1 field2
001 banana 1
001 apple 1
003 apple 1
004 orange 21
005 orange 32
然后 DF2 跟随 DF1....
DF1
ID field1
001 banana
001 apple
002 banana
003 apple
004 orange
005 orange
把它想象成 DF3 是完整的记录。我想要 DF1 和 DF2 的不完整记录与 DF3 一起 table。
我希望我的最终结果如下所示:
ID field1 field2 field3
001 banana 1 y
001 apple 1 y
002 banana NULL NULL
003 apple 1 NULL
004 orange 21 n
005 orange 32 y
我认为这可以通过 UNION 的一些组合来完成,但我正在努力研究如何在 snowflake 中这样做。
这看起来像 left join
:
select df1.id, df1.field1, df2.field2, df3.field3
from df1 left join
df2
on df1.id = df2.id and
df1.field1 = df2.field2 left join
df3
on df2.id = df3.id and
df2.field1 = df3.field1 and
df2.field2 = df3.field2;
嗯,我会进行完全外部联接,以防 ID 不完全同步。
with df3 as (
select '001' ID, 'banana' field1, 1 field2, 'y' field3
union all select '001' ID, 'apple' field1, 1 field2, 'y' field3
union all select '004' ID, 'orange' field1, 21 field2, 'n' field3
union all select '005' ID, 'orange' field1, 32 field2, 'y' field3)
,df2 as (
select '001' ID, 'banana' field1, 1 field2
union all select '001' ID, 'apple' field1, 1 field2
union all select '003' ID, 'apple' field1, 1 field2
union all select '004' ID, 'orange' field1, 21 field2
union all select '005' ID, 'orange' field1, 32 field2 )
, df1 as (select '001' ID, 'banana' field1
union all select '001' ID, 'apple' field1
union all select '002' ID, 'banana' field1
union all select '003' ID, 'apple' field1
union all select '004' ID, 'orange' field1
union all select '005' ID, 'orange' field1)
select
coalesce(df1.id, df2.id, df3.id) ID,
coalesce(df1.field1, df2.field1, df3.field1) field1,
coalesce(df2.field2, df3.field2) field2,
df3.field3
from df1 full outer join df2 on df1.id = df2.id full outer join df3 on
df1.id = df3.id
group by 1,2,3,4
所以我有一个看起来像这样的 df
DF3
ID field1 field2 field3
001 banana 1 y
001 apple 1 y
004 orange 21 n
005 orange 32 y
现在这个 table 是 DF3,它是 Df2 的未来状态,如下所示:
DF2
ID field1 field2
001 banana 1
001 apple 1
003 apple 1
004 orange 21
005 orange 32
然后 DF2 跟随 DF1....
DF1
ID field1
001 banana
001 apple
002 banana
003 apple
004 orange
005 orange
把它想象成 DF3 是完整的记录。我想要 DF1 和 DF2 的不完整记录与 DF3 一起 table。
我希望我的最终结果如下所示:
ID field1 field2 field3
001 banana 1 y
001 apple 1 y
002 banana NULL NULL
003 apple 1 NULL
004 orange 21 n
005 orange 32 y
我认为这可以通过 UNION 的一些组合来完成,但我正在努力研究如何在 snowflake 中这样做。
这看起来像 left join
:
select df1.id, df1.field1, df2.field2, df3.field3
from df1 left join
df2
on df1.id = df2.id and
df1.field1 = df2.field2 left join
df3
on df2.id = df3.id and
df2.field1 = df3.field1 and
df2.field2 = df3.field2;
嗯,我会进行完全外部联接,以防 ID 不完全同步。
with df3 as (
select '001' ID, 'banana' field1, 1 field2, 'y' field3
union all select '001' ID, 'apple' field1, 1 field2, 'y' field3
union all select '004' ID, 'orange' field1, 21 field2, 'n' field3
union all select '005' ID, 'orange' field1, 32 field2, 'y' field3)
,df2 as (
select '001' ID, 'banana' field1, 1 field2
union all select '001' ID, 'apple' field1, 1 field2
union all select '003' ID, 'apple' field1, 1 field2
union all select '004' ID, 'orange' field1, 21 field2
union all select '005' ID, 'orange' field1, 32 field2 )
, df1 as (select '001' ID, 'banana' field1
union all select '001' ID, 'apple' field1
union all select '002' ID, 'banana' field1
union all select '003' ID, 'apple' field1
union all select '004' ID, 'orange' field1
union all select '005' ID, 'orange' field1)
select
coalesce(df1.id, df2.id, df3.id) ID,
coalesce(df1.field1, df2.field1, df3.field1) field1,
coalesce(df2.field2, df3.field2) field2,
df3.field3
from df1 full outer join df2 on df1.id = df2.id full outer join df3 on
df1.id = df3.id
group by 1,2,3,4