通配符查询中同义结构的 BigQuery COALESCE

BigQuery COALESCE for synonymous structs in wildcard query

在 table chrome-ux-report.all.201910 和更早的版本中,我们有一个名为 experimental.first_input_delay 的字段。自 chrome-ux-report.all.201911 起,相同的数据已重命名为 first_input.delay

在此更改之前,我使用 chrome-ux-report.all.* 之类的通配符查询来聚合所有 YYYYMM 数据,但现在这些查询失败了,因为字段名称不同。我正在寻找可以容纳旧的或新的字段名称的修复程序。这是一个简化的例子:

SELECT
  COALESCE(first_input.delay.histogram.bin, experimental.first_input_delay.histogram.bin) AS fid
FROM
  `chrome-ux-report.all.*`

这会导致 first_input_delayexperimental 结构的架构中不存在的错误:

Error: Field name first_input_delay does not exist in STRUCT<time_to_first_byte STRUCT<histogram STRUCT<bin ARRAY<STRUCT<start INT64, end INT64, density FLOAT64>>>>>` at [2:58]

当然,该字段存在于该结构中,用于通配符涵盖的某些 table,但其他不存在。验证器似乎只查看最近的 table.

所以我的问题是是否可以使用 COALESCE 之类的东西来容纳跨 table 重命名的字段?我知道该模式使我们更难做到这一点,更好的解决方案是使用单个分区 table 但我想听听根据我们当前的设置这是否可以解决。

尝试以下操作:

SELECT
  #Use coalesce for all the fields existing in the two tables#
  COALESCE(t1.first_input.delay.histogram.bin, t2.experimental.first_input_delay.histogram.bin) AS fid
FROM
(SELECT * FROM  `tables-with-old-field`) t1 FULL OUTER JOIN
(SELECT * FROM  `tables-with-new-field`) t2
ON t1.primary_key = t2.primary_key 

刚刚编辑了查询。如果有效请告诉我

* 通配符联合表,因此 COALESCE 将只有一个可用。当您使用两列作为参数调用 COALESCE 时,它将失败。

您将希望以不同方式处理每个模式,然后将它们合并。

with old_stuff as (
  -- Process the old data
  select some stuff
  from `chrome-ux-report.all.*`
  where _TABLE_SUFFIX <= '201910'
),
new_stuff as (
  -- Process the new data
  select and rename some stuff
  from `chrome-ux-report.all.*`
  where _TABLE_SUFFIX >= '201911'
),
unioned as (
  select * from old_stuff 
  union all 
  select * from new_stuff
)
select * from unioned

Select,根据需要在每个 CTE 中重命名和转换。

尝试为您的用户提供一个视图 - 起点可以是:

CREATE OR REPLACE VIEW `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001`
AS
SELECT * EXCEPT(experimental)
  , experimental.first_input_delay.histogram.bin AS fid
  , CONCAT('2018', _table_suffix) ts
FROM `chrome-ux-report.all.2018*` 
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
  , CONCAT('20190', _table_suffix) ts
FROM `chrome-ux-report.all.20190*`  
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
  , '201910'
FROM `chrome-ux-report.all.201910`   
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
  , '201911'
FROM `chrome-ux-report.all.201911`   
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
  , '201912'
FROM `chrome-ux-report.all.201912`   

现在您的用户可以 运行 查询,例如:

SELECT ts, origin, fid
FROM `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001` 
LIMIT 10

Ps:这些表确实需要聚类 - 如果表是这样的话,这 query would process significantly less bytes