BigQuery - 减少字段中的唯一记录
BigQuery - reducing to unique records within a field
我有一个 table 字段如下:
ID Field 1 Field 2
1 22,34,05,44,44 01,02,02,03
2 11,01,05 02,02,01,01,22
如何在 BigQuery (standardSQL) 中将其转换为仅显示唯一记录并按从大到小排序?
因此输出将如下所示:
ID Field 1 Field 2
1 05,22,34,44 01,02,03
2 01,05,11 01,02,22
我尝试使用 Split
但后来我 运行 重复了数百次,而且 window
函数不允许 distinct
稍后将这些东西组合在一起.
请帮忙解答
您可以拆分字符串以将它们变成数组,然后使用 DISTINCT
删除重复项并使用 ORDER BY
:
排序
SELECT
ID,
ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x) AS field1,
ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x) AS field2
FROM `project-name`.dataset.table
如果想再次将数组转为逗号分隔的字符串,可以使用ARRAY_TO_STRING
函数:
SELECT
ID,
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x), ',') AS field1,
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x), ',') AS field2
FROM `project-name`.dataset.table
我有一个 table 字段如下:
ID Field 1 Field 2
1 22,34,05,44,44 01,02,02,03
2 11,01,05 02,02,01,01,22
如何在 BigQuery (standardSQL) 中将其转换为仅显示唯一记录并按从大到小排序?
因此输出将如下所示:
ID Field 1 Field 2
1 05,22,34,44 01,02,03
2 01,05,11 01,02,22
我尝试使用 Split
但后来我 运行 重复了数百次,而且 window
函数不允许 distinct
稍后将这些东西组合在一起.
请帮忙解答
您可以拆分字符串以将它们变成数组,然后使用 DISTINCT
删除重复项并使用 ORDER BY
:
SELECT
ID,
ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x) AS field1,
ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x) AS field2
FROM `project-name`.dataset.table
如果想再次将数组转为逗号分隔的字符串,可以使用ARRAY_TO_STRING
函数:
SELECT
ID,
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field1, ',')) AS x ORDER BY x), ',') AS field1,
ARRAY_TO_STRING(ARRAY(SELECT DISTINCT x FROM UNNEST(SPLIT(field2, ',')) AS x ORDER BY x), ',') AS field2
FROM `project-name`.dataset.table