如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值
How to get distinct values on GROUP_CONCAT using Google Big Query
我试图在 BigQuery 中使用 GROUP_CONCAT 时获取不同的值。
我将使用一个更简单的静态示例来重现这种情况:
编辑: 我修改了示例以更好地代表我的真实情况:2 列 group_concat 需要区分:
SELECT
category,
GROUP_CONCAT(id) as ids,
GROUP_CONCAT(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
这个例子returns:
Row category ids products
1 a 1,2,3,1 car,car,car,truck
2 b 4,5,6 car,car,bike
我想去除找到的重复值,return 如:
Row category ids products
1 a 1,2,3 car,truck
2 b 4,5,6 car,bike
在 MySQL 中,GROUP_CONCAT 有一个 DISTINCT OPTION,但在 BigQuery 中没有。
有什么想法吗?
应用前删除重复项 group_concat 将达到您想要的结果:
SELECT
category,
GROUP_CONCAT(id) as ids
FROM (
SELECT category, id
FROM
(SELECT "a" as category, "1" as id),
(SELECT "a" as category, "2" as id),
(SELECT "a" as category, "3" as id),
(SELECT "b" as category, "4" as id),
(SELECT "b" as category, "5" as id),
(SELECT "b" as category, "6" as id),
(SELECT "a" as category, "1" as id),
GROUP BY
category, id
)
GROUP BY
category
这是使用 UNIQUE
范围聚合函数删除重复项的解决方案。请注意,为了使用它,首先我们需要使用 NEST
聚合构建一个 REPEATED
:
SELECT
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)
在标准 SQL(首选 BigQuery 方言)中,解决方案是:
SELECT
string_agg(distinct(q.product), ', ') as products_distinct
FROM
(
(SELECT "a" as category, "1" as id, "car" as product)
union all
(SELECT "a" as category, "2" as id, "car" as product)
union all
(SELECT "a" as category, "3" as id, "car" as product)
union all
(SELECT "b" as category, "4" as id, "car" as product)
union all
(SELECT "b" as category, "5" as id, "car" as product)
union all
(SELECT "b" as category, "2" as id, "bike" as product)
union all
(SELECT "a" as category, "1" as id, "truck" as product)
) as q
我试图在 BigQuery 中使用 GROUP_CONCAT 时获取不同的值。
我将使用一个更简单的静态示例来重现这种情况:
编辑: 我修改了示例以更好地代表我的真实情况:2 列 group_concat 需要区分:
SELECT
category,
GROUP_CONCAT(id) as ids,
GROUP_CONCAT(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
这个例子returns:
Row category ids products
1 a 1,2,3,1 car,car,car,truck
2 b 4,5,6 car,car,bike
我想去除找到的重复值,return 如:
Row category ids products
1 a 1,2,3 car,truck
2 b 4,5,6 car,bike
在 MySQL 中,GROUP_CONCAT 有一个 DISTINCT OPTION,但在 BigQuery 中没有。
有什么想法吗?
应用前删除重复项 group_concat 将达到您想要的结果:
SELECT
category,
GROUP_CONCAT(id) as ids
FROM (
SELECT category, id
FROM
(SELECT "a" as category, "1" as id),
(SELECT "a" as category, "2" as id),
(SELECT "a" as category, "3" as id),
(SELECT "b" as category, "4" as id),
(SELECT "b" as category, "5" as id),
(SELECT "b" as category, "6" as id),
(SELECT "a" as category, "1" as id),
GROUP BY
category, id
)
GROUP BY
category
这是使用 UNIQUE
范围聚合函数删除重复项的解决方案。请注意,为了使用它,首先我们需要使用 NEST
聚合构建一个 REPEATED
:
SELECT
GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD
FROM (
SELECT
category,
NEST(id) as ids,
NEST(product) as products
FROM
(SELECT "a" as category, "1" as id, "car" as product),
(SELECT "a" as category, "2" as id, "car" as product),
(SELECT "a" as category, "3" as id, "car" as product),
(SELECT "b" as category, "4" as id, "car" as product),
(SELECT "b" as category, "5" as id, "car" as product),
(SELECT "b" as category, "2" as id, "bike" as product),
(SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY
category
)
在标准 SQL(首选 BigQuery 方言)中,解决方案是:
SELECT
string_agg(distinct(q.product), ', ') as products_distinct
FROM
(
(SELECT "a" as category, "1" as id, "car" as product)
union all
(SELECT "a" as category, "2" as id, "car" as product)
union all
(SELECT "a" as category, "3" as id, "car" as product)
union all
(SELECT "b" as category, "4" as id, "car" as product)
union all
(SELECT "b" as category, "5" as id, "car" as product)
union all
(SELECT "b" as category, "2" as id, "bike" as product)
union all
(SELECT "a" as category, "1" as id, "truck" as product)
) as q