如何使用正则表达式拆分字符串?
How to split a string using regex?
我正在尝试用每个“_”字符拆分 ad_content,但我不知道为什么我不能比第 9 个拆分单词 (splits[SAFE_OFFSET(8)] AS objective
) 更进一步。
这是我正在使用的查询:
SELECT
ad_content,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience,
splits[SAFE_OFFSET(6)] AS route_type,
splits[SAFE_OFFSET(7)] AS business,
splits[SAFE_OFFSET(8)] AS objective,
splits[SAFE_OFFSET(9)] AS format,
splits[SAFE_OFFSET(10)] AS nnn,
splits[SAFE_OFFSET(11)] AS date,
FROM (
SELECT
AD_CONTENT,
SPLIT(REGEXP_REPLACE(
AD_CONTENT,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'|||||||||||'),
'|') AS splits
FROM ga_digital_marketing
例如,ad_content = us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906
这是使用上述查询的结果:
ad_content
country
product
budget
source
campaign
audience
route_type
business
objective
format
nnn
date
us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906
us
latam
perf
facebook
black-friday
bbdd-push
SCL-CCP
domestic
conversion
us0
us1
us2
正如您在上面看到的,格式列 (splits[SAFE_OFFSET(9)] AS format
) 没有正确给出结果。
我认为问题出在这里:r'|||||||||||')
因为可能 |\10 的数字 0 没有将其识别为数字而是字符串。这就是为什么我得到 us0 us1 和 us2
这个限制有解决方案吗?
是否有另一种拆分 ad_content 示例的方法?
BigQuery 的 REGEXP_REPLACE 仅支持 \1 到 \9 - 这就是原因!
Is there a solution for this limitation?
改用下面的方法
SELECT
-- ad_content,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience,
splits[SAFE_OFFSET(6)] AS route_type,
splits[SAFE_OFFSET(7)] AS business,
splits[SAFE_OFFSET(8)] AS objective,
splits[SAFE_OFFSET(9)] AS format,
splits[SAFE_OFFSET(10)] AS nnn,
splits[SAFE_OFFSET(11)] AS date,
FROM (
SELECT
AD_CONTENT,
SPLIT(AD_CONTENT, '_') AS splits
FROM ga_digital_marketing
)
如果应用于您问题中的示例 - 输出为
我正在尝试用每个“_”字符拆分 ad_content,但我不知道为什么我不能比第 9 个拆分单词 (splits[SAFE_OFFSET(8)] AS objective
) 更进一步。
这是我正在使用的查询:
SELECT
ad_content,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience,
splits[SAFE_OFFSET(6)] AS route_type,
splits[SAFE_OFFSET(7)] AS business,
splits[SAFE_OFFSET(8)] AS objective,
splits[SAFE_OFFSET(9)] AS format,
splits[SAFE_OFFSET(10)] AS nnn,
splits[SAFE_OFFSET(11)] AS date,
FROM (
SELECT
AD_CONTENT,
SPLIT(REGEXP_REPLACE(
AD_CONTENT,
r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
r'|||||||||||'),
'|') AS splits
FROM ga_digital_marketing
例如,ad_content = us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906
这是使用上述查询的结果:
ad_content | country | product | budget | source | campaign | audience | route_type | business | objective | format | nnn | date |
---|---|---|---|---|---|---|---|---|---|---|---|---|
us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906 | us | latam | perf | black-friday | bbdd-push | SCL-CCP | domestic | conversion | us0 | us1 | us2 |
正如您在上面看到的,格式列 (splits[SAFE_OFFSET(9)] AS format
) 没有正确给出结果。
我认为问题出在这里:r'|||||||||||')
因为可能 |\10 的数字 0 没有将其识别为数字而是字符串。这就是为什么我得到 us0 us1 和 us2
这个限制有解决方案吗?
是否有另一种拆分 ad_content 示例的方法?
BigQuery 的 REGEXP_REPLACE 仅支持 \1 到 \9 - 这就是原因!
Is there a solution for this limitation?
改用下面的方法
SELECT
-- ad_content,
splits[SAFE_OFFSET(0)] AS country,
splits[SAFE_OFFSET(1)] AS product,
splits[SAFE_OFFSET(2)] AS budget,
splits[SAFE_OFFSET(3)] AS source,
splits[SAFE_OFFSET(4)] AS campaign,
splits[SAFE_OFFSET(5)] AS audience,
splits[SAFE_OFFSET(6)] AS route_type,
splits[SAFE_OFFSET(7)] AS business,
splits[SAFE_OFFSET(8)] AS objective,
splits[SAFE_OFFSET(9)] AS format,
splits[SAFE_OFFSET(10)] AS nnn,
splits[SAFE_OFFSET(11)] AS date,
FROM (
SELECT
AD_CONTENT,
SPLIT(AD_CONTENT, '_') AS splits
FROM ga_digital_marketing
)
如果应用于您问题中的示例 - 输出为