通配符查询中同义结构的 BigQuery COALESCE

Question

在 table chrome-ux-report.all.201910 和更早的版本中，我们有一个名为 experimental.first_input_delay 的字段。自 chrome-ux-report.all.201911 起，相同的数据已重命名为 first_input.delay。

在此更改之前，我使用 chrome-ux-report.all.* 之类的通配符查询来聚合所有 YYYYMM 数据，但现在这些查询失败了，因为字段名称不同。我正在寻找可以容纳旧的或新的字段名称的修复程序。这是一个简化的例子：

SELECT
  COALESCE(first_input.delay.histogram.bin, experimental.first_input_delay.histogram.bin) AS fid
FROM
  `chrome-ux-report.all.*`

这会导致 first_input_delay 在 experimental 结构的架构中不存在的错误：

Error: Field name first_input_delay does not exist in STRUCT<time_to_first_byte STRUCT<histogram STRUCT<bin ARRAY<STRUCT<start INT64, end INT64, density FLOAT64>>>>>` at [2:58]

当然，该字段存在于该结构中，用于通配符涵盖的某些 table，但其他不存在。验证器似乎只查看最近的 table.

所以我的问题是是否可以使用 COALESCE 之类的东西来容纳跨 table 重命名的字段？我知道该模式使我们更难做到这一点，更好的解决方案是使用单个分区 table 但我想听听根据我们当前的设置这是否可以解决。

Answer 1

尝试以下操作：

SELECT
  #Use coalesce for all the fields existing in the two tables#
  COALESCE(t1.first_input.delay.histogram.bin, t2.experimental.first_input_delay.histogram.bin) AS fid
FROM
(SELECT * FROM  `tables-with-old-field`) t1 FULL OUTER JOIN
(SELECT * FROM  `tables-with-new-field`) t2
ON t1.primary_key = t2.primary_key

刚刚编辑了查询。如果有效请告诉我

Answer 2

* 通配符联合表，因此 COALESCE 将只有一个可用。当您使用两列作为参数调用 COALESCE 时，它将失败。

您将希望以不同方式处理每个模式，然后将它们合并。

with old_stuff as (
  -- Process the old data
  select some stuff
  from `chrome-ux-report.all.*`
  where _TABLE_SUFFIX <= '201910'
),
new_stuff as (
  -- Process the new data
  select and rename some stuff
  from `chrome-ux-report.all.*`
  where _TABLE_SUFFIX >= '201911'
),
unioned as (
  select * from old_stuff 
  union all 
  select * from new_stuff
)
select * from unioned

Select，根据需要在每个 CTE 中重命名和转换。

Answer 3

尝试为您的用户提供一个视图 - 起点可以是：

CREATE OR REPLACE VIEW `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001`
AS
SELECT * EXCEPT(experimental)
  , experimental.first_input_delay.histogram.bin AS fid
  , CONCAT('2018', _table_suffix) ts
FROM `chrome-ux-report.all.2018*` 
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
  , CONCAT('20190', _table_suffix) ts
FROM `chrome-ux-report.all.20190*`  
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental), experimental.first_input_delay.histogram.bin
  , '201910'
FROM `chrome-ux-report.all.201910`   
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
  , '201911'
FROM `chrome-ux-report.all.201911`   
UNION ALL
SELECT * EXCEPT(largest_contentful_paint,  experimental, first_input, layout_instability), first_input.delay.histogram.bin
  , '201912'
FROM `chrome-ux-report.all.201912`

现在您的用户可以运行查询，例如：

SELECT ts, origin, fid
FROM `fh-bigquery.public_dump.chrome_ux_experimental_input_delay_view_202001` 
LIMIT 10

Ps：这些表确实需要聚类 - 如果表是这样的话，这 query would process significantly less bytes。

通配符查询中同义结构的 BigQuery COALESCE

BigQuery COALESCE for synonymous structs in wildcard query

google-bigquery

chrome-ux-report