如何查找和提取子字符串 BIGQUERY
How to fin and extract substring BIGQUERY
A 在 BigQuery table 中有一个字符串列,例如:
name
WW_for_all_feed
EU_param_1_for_all_feed
AU_for_all_full_settings_18+
WW_for_us_param_5_for_us_feed
WW_for_us_param_5_feed
WW_for_all_25+
还有一个变量列表,例如:
param_1_for_all
param_5_for_us
param_5
full_settings
如果“名称”列中的字符串包含此子字符串之一,则需要将其提取:
name
param
WW_for_all_feed
None
EU_param_1_for_all_feed
param_1_for_all
AU_for_all_full_settings_18+
full_settings
WW_for_us_param_5_for_us_feed
param_5_for_us
WW_for_us_param_5_feed
param_5
WW_for_all_25+
None
我想尝试正则表达式和替换,但不知道查找子字符串的模式
下面使用
select name, param
from your_table
left join params
on regexp_contains(name, param)
如果适用于您问题中的示例数据
with your_table as (
select 'WW_for_all_feed' name union all
select 'EU_param_1_for_all_feed' union all
select 'AU_for_all_full_settings_18+' union all
select 'WW_for_us_param_5_for_us_feed' union all
select 'WW_for_all_25+'
), params as (
select 'param_1_for_all' param union all
select 'param_5_for_us' union all
select 'full_settings'
)
输出为
but I have an another issue (updated question) If one of params is substring for another?
然后在下面使用
select name, string_agg(param order by length(param) desc limit 1) param
from your_table
left join params
on regexp_contains(name, param)
group by name
如果应用于更新的数据样本 - 输出为
A 在 BigQuery table 中有一个字符串列,例如:
name |
---|
WW_for_all_feed |
EU_param_1_for_all_feed |
AU_for_all_full_settings_18+ |
WW_for_us_param_5_for_us_feed |
WW_for_us_param_5_feed |
WW_for_all_25+ |
还有一个变量列表,例如:
param_1_for_all
param_5_for_us
param_5
full_settings
如果“名称”列中的字符串包含此子字符串之一,则需要将其提取:
name | param |
---|---|
WW_for_all_feed | None |
EU_param_1_for_all_feed | param_1_for_all |
AU_for_all_full_settings_18+ | full_settings |
WW_for_us_param_5_for_us_feed | param_5_for_us |
WW_for_us_param_5_feed | param_5 |
WW_for_all_25+ | None |
我想尝试正则表达式和替换,但不知道查找子字符串的模式
下面使用
select name, param
from your_table
left join params
on regexp_contains(name, param)
如果适用于您问题中的示例数据
with your_table as (
select 'WW_for_all_feed' name union all
select 'EU_param_1_for_all_feed' union all
select 'AU_for_all_full_settings_18+' union all
select 'WW_for_us_param_5_for_us_feed' union all
select 'WW_for_all_25+'
), params as (
select 'param_1_for_all' param union all
select 'param_5_for_us' union all
select 'full_settings'
)
输出为
but I have an another issue (updated question) If one of params is substring for another?
然后在下面使用
select name, string_agg(param order by length(param) desc limit 1) param
from your_table
left join params
on regexp_contains(name, param)
group by name
如果应用于更新的数据样本 - 输出为