如何使用正则表达式拆分字符串?

How to split a string using regex?

我正在尝试用每个“_”字符拆分 ad_content,但我不知道为什么我不能比第 9 个拆分单词 (splits[SAFE_OFFSET(8)] AS objective) 更进一步。

这是我正在使用的查询:

SELECT
    ad_content,
    splits[SAFE_OFFSET(0)] AS country,
    splits[SAFE_OFFSET(1)] AS product,
    splits[SAFE_OFFSET(2)] AS budget,
    splits[SAFE_OFFSET(3)] AS source,
    splits[SAFE_OFFSET(4)] AS campaign,
    splits[SAFE_OFFSET(5)] AS audience,
    splits[SAFE_OFFSET(6)] AS route_type,
    splits[SAFE_OFFSET(7)] AS business,
    splits[SAFE_OFFSET(8)] AS objective,
    splits[SAFE_OFFSET(9)] AS format,
    splits[SAFE_OFFSET(10)] AS nnn,
    splits[SAFE_OFFSET(11)] AS date,
FROM (
  SELECT
    AD_CONTENT,
    SPLIT(REGEXP_REPLACE(
            AD_CONTENT,
            r'([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_([^_]+)_(.+)',
            r'|||||||||||'),
          '|') AS splits
  FROM ga_digital_marketing

例如,ad_content = us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906

这是使用上述查询的结果:

ad_content country product budget source campaign audience route_type business objective format nnn date
us_latam_perf_facebook_black-friday_bbdd-push_SCL-CCP_domestic_conversion_push_all_20210906 us latam perf facebook black-friday bbdd-push SCL-CCP domestic conversion us0 us1 us2

正如您在上面看到的,格式列 (splits[SAFE_OFFSET(9)] AS format) 没有正确给出结果。

我认为问题出在这里:r'|||||||||||') 因为可能 |\10 的数字 0 没有将其识别为数字而是字符串。这就是为什么我得到 us0 us1us2

这个限制有解决方案吗?

是否有另一种拆分 ad_content 示例的方法?

BigQuery 的 REGEXP_REPLACE 仅支持 \1 到 \9 - 这就是原因!

Is there a solution for this limitation?

改用下面的方法

SELECT
    -- ad_content,
    splits[SAFE_OFFSET(0)] AS country,
    splits[SAFE_OFFSET(1)] AS product,
    splits[SAFE_OFFSET(2)] AS budget,
    splits[SAFE_OFFSET(3)] AS source,
    splits[SAFE_OFFSET(4)] AS campaign,
    splits[SAFE_OFFSET(5)] AS audience,
    splits[SAFE_OFFSET(6)] AS route_type,
    splits[SAFE_OFFSET(7)] AS business,
    splits[SAFE_OFFSET(8)] AS objective,
    splits[SAFE_OFFSET(9)] AS format,
    splits[SAFE_OFFSET(10)] AS nnn,
    splits[SAFE_OFFSET(11)] AS date,
FROM (
  SELECT
    AD_CONTENT,
    SPLIT(AD_CONTENT, '_') AS splits
  FROM ga_digital_marketing
)    

如果应用于您问题中的示例 - 输出为