从正则表达式匹配创建键值结构数组
Create a array of key-value structs from regex matches
鉴于以下项目的 table 和预定义模式列表 (modern, rustic, contemporary, classic, vintage
),我如何创建另一个 table,其中包含按每个项目的来源组织的正则表达式匹配 (source_1, source_2
等),
每个匹配项的结构采用键值格式,即 <pattern STRING , source ARRAY<STRING>>
,每一行将包含这些结构的数组,即 ARRAY <<pattern STRING , source ARRAY <STRING>>>
项 table :
with items_for_sale AS (
select 1 as item_id, 'modern chair' as source_1, ['contemporary chair', 'modernist chair'] as source_2,
union all
select 2 as item_id, 'classic lamp' as source_1, ['modern vintage lamp', 'blah'] as source_2,
union all
select 3 as item_id, 'rustic bed' as source_1, ['cottage bed', 'vintage country bed'] as source_2,
)
select* from items_for_sale
要搜索的预定义模式列表,例如
modern, rustic, contemporary, classic, vintage
(实际列表有 ~1000 个项目),正则表达式预计会查找字符串 包含 的模式
预期输出 table 每个项目的正则表达式按来源匹配:
通过为每个 item_id 创建一个键值字典,使用 python 或任何其他语言都非常简单,但是是否可以在 BQ SQL 中做到这一点
考虑以下简单方法
with patterns as (
select pattern
from unnest(['modern', 'rustic', 'contemporary', 'classic', 'vintage']) pattern
)
select item_id,
array_agg(struct(pattern, source) order by pattern, source) regexp_matches_by_source
from (
select item_id, source_1 as value, 'source_1' as source from items_for_sale union all
select item_id, source_2, 'source_2' from items_for_sale t, t.source_2 as source_2
)
join patterns
on regexp_contains(value, pattern)
group by item_id
如果应用于您问题中的示例数据 - 输出为
鉴于以下项目的 table 和预定义模式列表 (modern, rustic, contemporary, classic, vintage
),我如何创建另一个 table,其中包含按每个项目的来源组织的正则表达式匹配 (source_1, source_2
等),
每个匹配项的结构采用键值格式,即 <pattern STRING , source ARRAY<STRING>>
,每一行将包含这些结构的数组,即 ARRAY <<pattern STRING , source ARRAY <STRING>>>
项 table :
with items_for_sale AS (
select 1 as item_id, 'modern chair' as source_1, ['contemporary chair', 'modernist chair'] as source_2,
union all
select 2 as item_id, 'classic lamp' as source_1, ['modern vintage lamp', 'blah'] as source_2,
union all
select 3 as item_id, 'rustic bed' as source_1, ['cottage bed', 'vintage country bed'] as source_2,
)
select* from items_for_sale
要搜索的预定义模式列表,例如
modern, rustic, contemporary, classic, vintage
(实际列表有 ~1000 个项目),正则表达式预计会查找字符串 包含 的模式
预期输出 table 每个项目的正则表达式按来源匹配:
通过为每个 item_id 创建一个键值字典,使用 python 或任何其他语言都非常简单,但是是否可以在 BQ SQL 中做到这一点
考虑以下简单方法
with patterns as (
select pattern
from unnest(['modern', 'rustic', 'contemporary', 'classic', 'vintage']) pattern
)
select item_id,
array_agg(struct(pattern, source) order by pattern, source) regexp_matches_by_source
from (
select item_id, source_1 as value, 'source_1' as source from items_for_sale union all
select item_id, source_2, 'source_2' from items_for_sale t, t.source_2 as source_2
)
join patterns
on regexp_contains(value, pattern)
group by item_id
如果应用于您问题中的示例数据 - 输出为