如何在 BigQuery SQL 中使用 UNNEST 和 SPLIT 避免重复?

How can I avoid duplicates using UNNEST and SPLIT in BigQuery SQL?

我有以下数据

Id Historical_UTMs
1 a,b,c,d;e,f,g,h;
2 i,j,k,l;
3 m,n,o,p;q,r,s,t;u,v,w,x;

我想以下面的结尾

Id utm_Type utm_Timestamp utm_Web_Page utm_Referrer
1 a b c d
1 e f g h
2 i j k l
3 m n o p
3 q r s t
3 u v w x

我想将 Historical_UTMs 字段的内容拆分成不同的行(以 ; 分隔),所有行都保留 Id 字段,并且还想拆分新行中的每个值(以 ; 分隔通过 ,).

我有以下脚本可以创建一个包含正确信息的 table。 问题是所有的记录都是重复的。

有没有人可以帮助我理解为什么这个脚本会创建重复的行,以及如何解决它?

with Expanded as (
  select 
    Lead.Id,
    Lead.Historical_UTMs
  from
    `dataset.GS_UTMs` AS Lead,
    unnest(split(Historical_UTMs,';')) AS History_UTMs
)

select
  Expanded.Id,
  split(Expanded.Historical_UTMs,',')[safe_offset(0)] as utm_Type,
  split(Expanded.Historical_UTMs,',')[safe_offset(1)] as utm_Timestamp,
  split(Expanded.Historical_UTMs,',')[safe_offset(2)] as utm_Web_Page,
  split(Expanded.Historical_UTMs,',')[safe_offset(3)] as utm_Referrer,

from
  Expanded

如果我没理解错的话,问题是 historical_utms 在 CTE 中有多种含义,而你用错了。也许这样的事情会奏效:

with Expanded as (
      select l.Id, Historical_UTM
      from `stormgeo-bigquery.Data_to_send_to_BigQuery_from_Google_Sheet.GS_UTMs` l cross join
           unnest(split(Historical_UTMs,';')) AS History_UTM
          )
select e.Id,
       split(e.Historical_UTM, ',')[safe_offset(0)] as utm_Type,
       split(e.Historical_UTM, ',')[safe_offset(1)] as utm_Timestamp,
       split(e.Historical_UTM, ',')[safe_offset(9)] as utm_Web_Page,
       split(e.Historical_UTM, ',')[safe_offset(10)] as utm_Referrer
from Expanded e;

考虑以下

select Id, 
  UTM[offset(0)] as utm_Type,
  UTM[offset(1)] as utm_Timestamp,
  UTM[offset(2)] as utm_Web_Page,
  UTM[offset(3)] as utm_Referrer
from `project.dataset.GS_UTMs`,
unnest(split(trim(Historical_UTMs, ';'), ';')) Historical_UTM,
unnest([struct(split(Historical_UTM) as UTM)])        

如果应用于您问题中的示例数据 - 输出为