Bigquery 查询基于另一列获取一列值的总和
Bigquery query to get sum of values of one column based on another column
我想编写一个 bigquery 查询以根据“类似”条件获取一列的值,该列的值是另一列值的总和。
下面的table栏starts_with_count是我要填的。我手动添加了此列的预期值以显示我的期望。其他列值已存在。
starts_with_count 值为总和 (full_count),其中 link 出现在其他行中。
company
link
full_count
starts_with_count (expected)
abc
http://www.abc.net1
1
15 (= sum (full_count) where link like 'http://www.abc.net1%')
abc
http://www.abc.net1/page1
2
9 (= sum (full_count) where link like 'http://www.abc.net1/page1%')
abc
http://www.abc.net1/page1/folder1
3
3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%')
abc
http://www.abc.net1/page1/folder2
4
4
abc
http://www.abc.net1/page2
5
5
xyz
http://www.xyz.net1/
6
21
xyz
http://www.xyz.net1/page1/
7
15
xyz
http://www.xyz.net1/page1/file1
8
8
试试这个:
WITH sample AS (
SELECT * FROM UNNEST([
STRUCT('abc' AS company, 'http://www.abc.net1' AS link, 1 AS full_count),
('abc', 'http://www.abc.net1/page1', 2),
('abc', 'http://www.abc.net1/page1/folder1', 3),
('abc', 'http://www.abc.net1/page1/folder2', 4),
('abc', 'http://www.abc.net1/page2', 5),
('xyz', 'http://www.xyz.net1/', 6),
('xyz', 'http://www.xyz.net1/page1/', 7),
('xyz', 'http://www.xyz.net1/page1/file1', 8)
])
)
SELECT first.company, first.link, SUM(second.full_count) AS starts_with_count
FROM sample first, sample second
WHERE STARTS_WITH(second.link, first.link)
GROUP BY 1, 2
;
输出:
另一种选择
select * except(links),
( select sum(full_count)
from t.links
where starts_with(link, t.link)
) starts_with_count
from (
select *,
array_agg(struct(link, full_count)) over(partition by company) links
from your_table
) t
如果应用于您问题中的示例数据 - 输出为
对于您提供的 simple/dummy 示例 - 性能显着提高!
使用哪一个真的取决于你的real
数据!
分析 - 使用 EXECUTION DETAILS
tab
我想编写一个 bigquery 查询以根据“类似”条件获取一列的值,该列的值是另一列值的总和。 下面的table栏starts_with_count是我要填的。我手动添加了此列的预期值以显示我的期望。其他列值已存在。 starts_with_count 值为总和 (full_count),其中 link 出现在其他行中。
company | link | full_count | starts_with_count (expected) |
---|---|---|---|
abc | http://www.abc.net1 | 1 | 15 (= sum (full_count) where link like 'http://www.abc.net1%') |
abc | http://www.abc.net1/page1 | 2 | 9 (= sum (full_count) where link like 'http://www.abc.net1/page1%') |
abc | http://www.abc.net1/page1/folder1 | 3 | 3 (= sum (full_count) where link like 'http://www.abc.net1/page1/folder1%') |
abc | http://www.abc.net1/page1/folder2 | 4 | 4 |
abc | http://www.abc.net1/page2 | 5 | 5 |
xyz | http://www.xyz.net1/ | 6 | 21 |
xyz | http://www.xyz.net1/page1/ | 7 | 15 |
xyz | http://www.xyz.net1/page1/file1 | 8 | 8 |
试试这个:
WITH sample AS (
SELECT * FROM UNNEST([
STRUCT('abc' AS company, 'http://www.abc.net1' AS link, 1 AS full_count),
('abc', 'http://www.abc.net1/page1', 2),
('abc', 'http://www.abc.net1/page1/folder1', 3),
('abc', 'http://www.abc.net1/page1/folder2', 4),
('abc', 'http://www.abc.net1/page2', 5),
('xyz', 'http://www.xyz.net1/', 6),
('xyz', 'http://www.xyz.net1/page1/', 7),
('xyz', 'http://www.xyz.net1/page1/file1', 8)
])
)
SELECT first.company, first.link, SUM(second.full_count) AS starts_with_count
FROM sample first, sample second
WHERE STARTS_WITH(second.link, first.link)
GROUP BY 1, 2
;
输出:
另一种选择
select * except(links),
( select sum(full_count)
from t.links
where starts_with(link, t.link)
) starts_with_count
from (
select *,
array_agg(struct(link, full_count)) over(partition by company) links
from your_table
) t
如果应用于您问题中的示例数据 - 输出为
对于您提供的 simple/dummy 示例 - 性能显着提高!
使用哪一个真的取决于你的real
数据!
分析 - 使用 EXECUTION DETAILS
tab