BigQuery 检查数组重叠

BigQuery check for array overlap

所以我正在编写一个 BigQuery 查询,基本上只需要能够检查是否有多个字符串中的任何一个作为元素出现在 table 的列之一中,其中关心的 -关于列本身包含字符串数组。仅出于上下文考虑,我将查询编写为一个小的自动化 Python 作业的一部分,并且我使用的是标准 SQL.

我在这里找不到任何可以显式检查数组包含的内容:https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators

所以我想出了一个解决方案,它使用了一个非常 hacky 的正则表达式,具体来说:

...other query stuff...

WHERE
    REGEXP_CONTAINS((LOWER(ARRAY_TO_STRING(column, '-'))), r"({joined_string})")

...其中 column 是我在 table 中关心的列,而 joined_string 是由我需要检查的所有字符串组成的长字符串|(其中 | 作为正则表达式或运算符)。

BigQuery 标准 SQL 中是否存在某种内置功能,可以让人们更理智地执行此操作?

下面是两个例子。

首先假设你的字符串在另一个 table strings

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT 'abc' AS str UNION ALL
  SELECT '123' UNION ALL
  SELECT '456'
)
SELECT *
FROM yourTable
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) > 0  

如果您需要查看有多少字符串匹配

,您可以将以下添加到 SELECT 列表
(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN strings ON col = str) AS cnt

第二个例子假设你有打包在数组中的字符串列表

#standardSQL
WITH yourTable AS (
  SELECT 1 AS id, ['abc', 'def', 'xyz'] AS column UNION ALL
  SELECT 2, ['123', '456', '789'] UNION ALL
  SELECT 3, ['135', '246', '369'] 
),
strings AS (
  SELECT ['abc', 'def', '456'] AS strs
)
SELECT yourTable.*
FROM yourTable, strings
WHERE (SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) > 0   

与第一个示例相同 - 您可以将以下添加到 SELECT 列表以查看匹配项计数

(SELECT COUNT(1) FROM UNNEST(column) AS col JOIN UNNEST(strs) AS str ON col = str) AS cnt