BigQuery 和 Standard SQL:聚合数组中的每个不同字段
BigQuery and Standard SQL: aggregate per distnict field in array
我正在尝试计算所有五分钟时间段总和的每个地区 IP 的流量总和(以 gbps 为单位)。
我是 BigQuery 的新手,所以通过查看其他示例,我尝试了以下方法:
WITH `project.dataset.test` AS (
SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
),
ip AS (
SELECT DISTINCT (ip_address) ip_address
FROM `project.dataset.test`
),
qualified AS (
SELECT ip_address, network, ARRAY_AGG (gbps ORDER BY ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM `project.dataset.test`
GROUP BY ip_address, network
)
SELECT ip_address, network, SUM(gbps)gbps
FROM (
SELECT d.ip_address ip_address, network, ARRAY_AGG (gbps ORDER BY q.ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM ip d
JOIN qualified q
ON q.ip_address = d.ip_address
GROUP BY ip_address, network
)
group BY ip_address, network
ORDER BY gbps DESC
我预计输出为:
Row ip_address network gbps
1 192.168.10.3 12 9
2 192.168.10.2 11 6
3 192.168.10.1 10 3
相反,实际输出是:
Row ip_address network gbps
1 192.168.10.3 12 3
2 192.168.10.2 11 2
3 192.168.10.1 10 1
我做错了什么? select 如何计算不同 IP 的总和,而不考虑 5-5 分钟周期的数量 and/or 网络?仅供参考,我有数千行要排序,这只是我正在使用的示例。
How do select the sum of the distinct IPs, regardless of the number of 5-five minute periods and/or networks?
以下示例适用于 BigQuery Standatd SQL
#standardSQL
WITH `project.dataset.test` AS (
SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
)
SELECT ip_address, network, SUM(gbps) gbps
FROM `project.dataset.test`
GROUP BY ip_address, network
结果
Row ip_address network gbps
1 192.168.10.3 12 9
2 192.168.10.2 11 6
3 192.168.10.1 10 3
我正在尝试计算所有五分钟时间段总和的每个地区 IP 的流量总和(以 gbps 为单位)。
我是 BigQuery 的新手,所以通过查看其他示例,我尝试了以下方法:
WITH `project.dataset.test` AS (
SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
),
ip AS (
SELECT DISTINCT (ip_address) ip_address
FROM `project.dataset.test`
),
qualified AS (
SELECT ip_address, network, ARRAY_AGG (gbps ORDER BY ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM `project.dataset.test`
GROUP BY ip_address, network
)
SELECT ip_address, network, SUM(gbps)gbps
FROM (
SELECT d.ip_address ip_address, network, ARRAY_AGG (gbps ORDER BY q.ip_address DESC LIMIT 1)[SAFE_OFFSET(0)] gbps
FROM ip d
JOIN qualified q
ON q.ip_address = d.ip_address
GROUP BY ip_address, network
)
group BY ip_address, network
ORDER BY gbps DESC
我预计输出为:
Row ip_address network gbps
1 192.168.10.3 12 9
2 192.168.10.2 11 6
3 192.168.10.1 10 3
相反,实际输出是:
Row ip_address network gbps
1 192.168.10.3 12 3
2 192.168.10.2 11 2
3 192.168.10.1 10 1
我做错了什么? select 如何计算不同 IP 的总和,而不考虑 5-5 分钟周期的数量 and/or 网络?仅供参考,我有数千行要排序,这只是我正在使用的示例。
How do select the sum of the distinct IPs, regardless of the number of 5-five minute periods and/or networks?
以下示例适用于 BigQuery Standatd SQL
#standardSQL
WITH `project.dataset.test` AS (
SELECT '01/01/2019 12:30' time, '192.168.10.1' ip_address, 10 network, 1 gbps UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:30', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:35', '192.168.10.3', 12, 3 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.1', 10, 1 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.2', 11, 2 UNION ALL
SELECT '01/01/2019 12:40', '192.168.10.3', 12, 3
)
SELECT ip_address, network, SUM(gbps) gbps
FROM `project.dataset.test`
GROUP BY ip_address, network
结果
Row ip_address network gbps
1 192.168.10.3 12 9
2 192.168.10.2 11 6
3 192.168.10.1 10 3