如何在 BigQuery SQL 中使用 UNNEST 和 SPLIT 避免重复?
How can I avoid duplicates using UNNEST and SPLIT in BigQuery SQL?
我有以下数据
Id
Historical_UTMs
1
a,b,c,d;e,f,g,h;
2
i,j,k,l;
3
m,n,o,p;q,r,s,t;u,v,w,x;
我想以下面的结尾
Id
utm_Type
utm_Timestamp
utm_Web_Page
utm_Referrer
1
a
b
c
d
1
e
f
g
h
2
i
j
k
l
3
m
n
o
p
3
q
r
s
t
3
u
v
w
x
我想将 Historical_UTMs 字段的内容拆分成不同的行(以 ; 分隔),所有行都保留 Id 字段,并且还想拆分新行中的每个值(以 ; 分隔通过 ,).
我有以下脚本可以创建一个包含正确信息的 table。
问题是所有的记录都是重复的。
有没有人可以帮助我理解为什么这个脚本会创建重复的行,以及如何解决它?
with Expanded as (
select
Lead.Id,
Lead.Historical_UTMs
from
`dataset.GS_UTMs` AS Lead,
unnest(split(Historical_UTMs,';')) AS History_UTMs
)
select
Expanded.Id,
split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,
split(Expanded.Historical_UTMs,',')[safe_offset(1)] as utm_Timestamp,
split(Expanded.Historical_UTMs,',')[safe_offset(2)] as utm_Web_Page,
split(Expanded.Historical_UTMs,',')[safe_offset(3)] as utm_Referrer,
from
Expanded
如果我没理解错的话,问题是 historical_utms
在 CTE 中有多种含义,而你用错了。也许这样的事情会奏效:
with Expanded as (
select l.Id, Historical_UTM
from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
unnest(split(Historical_UTMs,';')) AS History_UTM
)
select e.Id,
split(e.Historical_UTM, ',')[safe_offset(0)] as utm_Type,
split(e.Historical_UTM, ',')[safe_offset(1)] as utm_Timestamp,
split(e.Historical_UTM, ',')[safe_offset(9)] as utm_Web_Page,
split(e.Historical_UTM, ',')[safe_offset(10)] as utm_Referrer
from Expanded e;
考虑以下
select Id,
UTM[offset(0)] as utm_Type,
UTM[offset(1)] as utm_Timestamp,
UTM[offset(2)] as utm_Web_Page,
UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,
unnest(split(trim(Historical_UTMs, ';'), ';')) Historical_UTM,
unnest([struct(split(Historical_UTM) as UTM)])
如果应用于您问题中的示例数据 - 输出为
我有以下数据
Id | Historical_UTMs |
---|---|
1 | a,b,c,d;e,f,g,h; |
2 | i,j,k,l; |
3 | m,n,o,p;q,r,s,t;u,v,w,x; |
我想以下面的结尾
Id | utm_Type | utm_Timestamp | utm_Web_Page | utm_Referrer |
---|---|---|---|---|
1 | a | b | c | d |
1 | e | f | g | h |
2 | i | j | k | l |
3 | m | n | o | p |
3 | q | r | s | t |
3 | u | v | w | x |
我想将 Historical_UTMs 字段的内容拆分成不同的行(以 ; 分隔),所有行都保留 Id 字段,并且还想拆分新行中的每个值(以 ; 分隔通过 ,).
我有以下脚本可以创建一个包含正确信息的 table。 问题是所有的记录都是重复的。
有没有人可以帮助我理解为什么这个脚本会创建重复的行,以及如何解决它?
with Expanded as (
select
Lead.Id,
Lead.Historical_UTMs
from
`dataset.GS_UTMs` AS Lead,
unnest(split(Historical_UTMs,';')) AS History_UTMs
)
select
Expanded.Id,
split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,
split(Expanded.Historical_UTMs,',')[safe_offset(1)] as utm_Timestamp,
split(Expanded.Historical_UTMs,',')[safe_offset(2)] as utm_Web_Page,
split(Expanded.Historical_UTMs,',')[safe_offset(3)] as utm_Referrer,
from
Expanded
如果我没理解错的话,问题是 historical_utms
在 CTE 中有多种含义,而你用错了。也许这样的事情会奏效:
with Expanded as (
select l.Id, Historical_UTM
from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
unnest(split(Historical_UTMs,';')) AS History_UTM
)
select e.Id,
split(e.Historical_UTM, ',')[safe_offset(0)] as utm_Type,
split(e.Historical_UTM, ',')[safe_offset(1)] as utm_Timestamp,
split(e.Historical_UTM, ',')[safe_offset(9)] as utm_Web_Page,
split(e.Historical_UTM, ',')[safe_offset(10)] as utm_Referrer
from Expanded e;
考虑以下
select Id,
UTM[offset(0)] as utm_Type,
UTM[offset(1)] as utm_Timestamp,
UTM[offset(2)] as utm_Web_Page,
UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,
unnest(split(trim(Historical_UTMs, ';'), ';')) Historical_UTM,
unnest([struct(split(Historical_UTM) as UTM)])
如果应用于您问题中的示例数据 - 输出为