从正则表达式匹配创建键值结构数组

Create a array of key-value structs from regex matches

鉴于以下项目的 table 和预定义模式列表 (modern, rustic, contemporary, classic, vintage),我如何创建另一个 table,其中包含按每个项目的来源组织的正则表达式匹配 (source_1, source_2 等),

每个匹配项的结构采用键值格式,即 <pattern STRING , source ARRAY<STRING>> ,每一行将包含这些结构的数组,即 ARRAY <<pattern STRING , source ARRAY <STRING>>>

项 table :

with items_for_sale AS (
    select 1 as item_id, 'modern chair' as source_1, ['contemporary chair', 'modernist chair'] as source_2,  
    union all 
    select 2 as item_id, 'classic lamp' as source_1, ['modern vintage lamp', 'blah'] as source_2,  
    union all 
    select 3 as item_id, 'rustic bed' as source_1, ['cottage bed', 'vintage country bed'] as source_2, 
) 
select* from items_for_sale 

要搜索的预定义模式列表,例如 modern, rustic, contemporary, classic, vintage(实际列表有 ~1000 个项目),正则表达式预计会查找字符串 包含 的模式

预期输出 table 每个项目的正则表达式按来源匹配:

通过为每个 item_id 创建一个键值字典,使用 python 或任何其他语言都非常简单,但是是否可以在 BQ SQL 中做到这一点

考虑以下简单方法

with patterns as (
  select pattern
  from unnest(['modern', 'rustic', 'contemporary', 'classic', 'vintage']) pattern
)
select item_id, 
  array_agg(struct(pattern, source) order by pattern, source) regexp_matches_by_source
from (
  select item_id, source_1 as value, 'source_1' as source from items_for_sale union all
  select item_id, source_2, 'source_2' from items_for_sale t, t.source_2 as source_2
)
join patterns 
on regexp_contains(value, pattern)
group by item_id    

如果应用于您问题中的示例数据 - 输出为