Bigquery 查询基于另一列获取一列值的总和

Bigquery query to get sum of values of one column based on another column

我想编写一个 bigquery 查询以根据“类似”条件获取一列的值,该列的值是另一列值的总和。 下面的table栏starts_with_count是我要填的。我手动添加了此列的预期值以显示我的期望。其他列值已存在。 starts_with_count 值为总和 (full_count),其中 link 出现在其他行中。

company link full_count starts_with_count (expected)
abc http://www.abc.net1 1 15 (= sum (full_count) where link like 'http://www.abc.net1%')
abc http://www.abc.net1/page1 2 9 (= sum (full_count) where link like 'http://www.abc.net1/page1%')
abc http://www.abc.net1/page1/folder1 3 3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%')
abc http://www.abc.net1/page1/folder2 4 4
abc http://www.abc.net1/page2 5 5
xyz http://www.xyz.net1/ 6 21
xyz http://www.xyz.net1/page1/ 7 15
xyz http://www.xyz.net1/page1/file1 8 8

试试这个:

WITH sample AS (
  SELECT * FROM UNNEST([
    STRUCT('abc' AS company, 'http://www.abc.net1' AS link, 1 AS full_count),
    ('abc', 'http://www.abc.net1/page1', 2),
    ('abc', 'http://www.abc.net1/page1/folder1', 3),
    ('abc', 'http://www.abc.net1/page1/folder2', 4),
    ('abc', 'http://www.abc.net1/page2', 5),
    ('xyz', 'http://www.xyz.net1/', 6),
    ('xyz', 'http://www.xyz.net1/page1/', 7),
    ('xyz', 'http://www.xyz.net1/page1/file1', 8)
  ])
)
SELECT first.company, first.link, SUM(second.full_count) AS starts_with_count
  FROM sample first, sample second 
 WHERE STARTS_WITH(second.link, first.link)
 GROUP BY 1, 2
;

输出:

另一种选择

select * except(links),
  ( select sum(full_count)
    from t.links
    where starts_with(link, t.link)
  ) starts_with_count
from (
  select *, 
    array_agg(struct(link, full_count)) over(partition by company) links
  from your_table
) t     

如果应用于您问题中的示例数据 - 输出为

对于您提供的 simple/dummy 示例 - 性能显着提高!

使用哪一个真的取决于你的real数据!
分析 - 使用 EXECUTION DETAILS tab