不使用 HLL 或 UDF 的跨 Bigquery 数组的简单非重复计数

Simple Distinct Count across Bigquery arrays without using HLL or UDFs

就像这里的例子一样,我想对 BigQuery 数组进行不同的计数:

但是,我有一些额外的要求使 post 中提供的解决方案对我来说可行:

因此,虽然这个扩展示例(包含用户作为分组维度)使用 HLL 工作:

#standardSQL
WITH
  test AS (
  SELECT
    'A' AS User, DATE('2018-01-01') AS ReportDate, 2 AS value, [1,2,3] AS key
  UNION ALL
  SELECT
    'A' AS User, DATE('2018-01-02') AS ReportDate, 3 AS value, [1,4,5] AS key
  UNION ALL
  SELECT
    'B' AS User, DATE('2018-01-02') AS ReportDate, 4 AS value, [4,5,6,7,8] AS key
  UNION ALL
  SELECT
    'B' AS User, DATE('2018-01-02') AS ReportDate, 5 AS value, [3,4,5,6,7] AS key )
SELECT
  User,
  SUM(value) total_value,
  HLL_COUNT.MERGE((
    SELECT
      HLL_COUNT.INIT(key)
    FROM
      UNNEST(key) key)) AS unique_key_count
FROM
  test
GROUP BY
  user

我需要一个版本来完成 不同的聚合数组计数 并满足上述要求。

同样,这意味着如果我仅在 ReportDate 上分组,User / ReportDate 或场景的组合,它也应该可以正常工作此示例扩展了其他维度。

#standardSQL
WITH test AS
(
  SELECT 'A' AS User, DATE('2018-01-01') AS ReportDate, 2 AS value, [1,2,3] AS key UNION ALL
  SELECT 'A' AS User, DATE('2018-01-02') AS ReportDate, 3 AS value, [1,4,5] AS key UNION ALL
  SELECT 'B' AS User, DATE('2018-01-02') AS ReportDate, 4 AS value, [4,5,6,7,8] AS key UNION ALL
  SELECT 'B' AS User, DATE('2018-01-02') AS ReportDate, 5 AS value, [3,4,5,6,7] AS key  
)
SELECT 
  User,
  SUM(IF(flag=0, value, 0)) total_value,
  COUNT(DISTINCT key) unique_key_count
FROM test, UNNEST(key) key WITH OFFSET flag
GROUP BY User   

结果

Row User    total_value unique_key_count     
1   A       5           5    
2   B       9           6