如何使用 Google Big Query 在 GROUP_CONCAT 上获取不同的值

How to get distinct values on GROUP_CONCAT using Google Big Query

我试图在 BigQuery 中使用 GROUP_CONCAT 时获取不同的值。

我将使用一个更简单的静态示例来重现这种情况:

编辑: 我修改了示例以更好地代表我的真实情况:2 列 group_concat 需要区分:

SELECT 
  category, 
  GROUP_CONCAT(id) as ids, 
  GROUP_CONCAT(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category

这个例子returns:

Row category    ids products
1   a   1,2,3,1 car,car,car,truck
2   b   4,5,6   car,car,bike

我想去除找到的重复值,return 如:

Row category    ids products 
1   a   1,2,3   car,truck
2   b   4,5,6   car,bike

在 MySQL 中,GROUP_CONCAT 有一个 DISTINCT OPTION,但在 BigQuery 中没有。

有什么想法吗?

应用前删除重复项 group_concat 将达到您想要的结果:

    SELECT 
      category, 
      GROUP_CONCAT(id) as ids
    FROM (  
    SELECT category, id
    FROM 
     (SELECT "a" as category, "1" as id),
     (SELECT "a" as category, "2" as id),
     (SELECT "a" as category, "3" as id),
     (SELECT "b" as category, "4" as id),
     (SELECT "b" as category, "5" as id),
     (SELECT "b" as category, "6" as id),
     (SELECT "a" as category, "1" as id),
    GROUP BY 
      category, id
    )
    GROUP BY 
      category

这是使用 UNIQUE 范围聚合函数删除重复项的解决方案。请注意,为了使用它,首先我们需要使用 NEST 聚合构建一个 REPEATED

SELECT 
  GROUP_CONCAT(UNIQUE(ids)) WITHIN RECORD,
  GROUP_CONCAT(UNIQUE(products)) WITHIN RECORD 
FROM (
SELECT 
  category, 
  NEST(id) as ids, 
  NEST(product) as products
FROM 
 (SELECT "a" as category, "1" as id, "car" as product),
 (SELECT "a" as category, "2" as id, "car" as product),
 (SELECT "a" as category, "3" as id, "car" as product),
 (SELECT "b" as category, "4" as id, "car" as product),
 (SELECT "b" as category, "5" as id, "car" as product),
 (SELECT "b" as category, "2" as id, "bike" as product),
 (SELECT "a" as category, "1" as id, "truck" as product),
GROUP BY 
  category
)

在标准 SQL(首选 BigQuery 方言)中,解决方案是:

SELECT 
    string_agg(distinct(q.product), ', ') as products_distinct

FROM 
    (
        (SELECT "a" as category, "1" as id, "car" as product)
        union all
        (SELECT "a" as category, "2" as id, "car" as product)
        union all
        (SELECT "a" as category, "3" as id, "car" as product)
        union all
        (SELECT "b" as category, "4" as id, "car" as product)
        union all
        (SELECT "b" as category, "5" as id, "car" as product)
        union all
        (SELECT "b" as category, "2" as id, "bike" as product)
        union all
        (SELECT "a" as category, "1" as id, "truck" as product)
    ) as q