在 bigquery 中滑动 window 年-周聚合
Sliding window aggregate for year-week in bigquery
我的问题是关于在 bigquery 中滑动 window 总结。
我有一个 table 如下所示
run_id year_week value
001 201451 5
001 201452 8
001 201501 1
001 201505 5
003 201352 8
003 201401 1
003 201405 5
在这里,每年的周数可以从 01 到 53。例如,2014 年的最后一周是 201452,而 2015 年的最后一周是 201553。如果它能让生活更轻松,我只有 5 年,2013 , 2014, 2015, 2016 和 2017,只有 2015 年的周数达到 53。
现在,对于每个 运行,我正在尝试获取值的滑动 window 总和。每个 year_week
将假定当前 run_id(例如 001
)的下一个 5 year_week
(包括其自身)的值之和。例如,以下可能是当前 table
的输出
run_id year_week aggregate_sum
001 201451 5+8+1+0+0
001 201452 8+1+0+0+0
001 201501 1+0+0+0+5
001 201502 0+0+0+5+0
001 201503 0+0+5+0+0
001 201504 0+5+0+0+0
001 201505 5+0+0+0+0
003 201352 8+1+0+0+0
003 201401 1+0+0+0+5
003 201402 0+0+0+5+0
003 201403 0+0+5+0+0
003 201404 0+5+0+0+0
003 201405 5+0+0+0+0
为了解释发生了什么,201451 包括其自身在内的接下来 5 周将是 201451,201452,201501,201502,201503。如果在 table 中的那些周有当前 run_id
的值,我们只需将它们加起来,即 5+8+1+0+0,因为 [=13] 的相应值=] 如果不在 table.
中则为 0
是否可以在 bigquery 中使用滑动 window 操作来实现?
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH weeks AS (
SELECT 100* year + week year_week
FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year,
UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
SELECT i.run_id, w.year_week, d.year_week week2, value
FROM weeks w
CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
LEFT JOIN `project.dataset.table` d
USING(year_week, run_id)
)
SELECT * FROM (
SELECT run_id, year_week,
SUM(value) OVER(win) aggregate_sum
FROM temp
WINDOW win AS (
PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
)
)
WHERE NOT aggregate_sum IS NULL
您可以使用问题中的虚拟数据测试/玩上面的内容,如下所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT '001' run_id, 201451 year_week, 5 value UNION ALL
SELECT '001', 201452, 8 UNION ALL
SELECT '001', 201501, 1 UNION ALL
SELECT '001', 201505, 5
), weeks AS (
SELECT 100* year + week year_week
FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year,
UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
SELECT i.run_id, w.year_week, d.year_week week2, value
FROM weeks w
CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
LEFT JOIN `project.dataset.table` d
USING(year_week, run_id)
)
SELECT * FROM (
SELECT run_id, year_week,
SUM(value) OVER(win) aggregate_sum
FROM temp
WINDOW win AS (
PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
)
)
WHERE NOT aggregate_sum IS NULL
-- ORDER BY run_id, year_week
结果为
Row run_id year_week aggregate_sum
1 001 201447 5
2 001 201448 13
3 001 201449 14
4 001 201450 14
5 001 201451 14
6 001 201452 9
7 001 201501 6
8 001 201502 5
9 001 201503 5
10 001 201504 5
11 001 201505 5
12 003 201348 8
13 003 201349 9
14 003 201350 9
15 003 201351 9
16 003 201352 9
17 003 201401 6
18 003 201402 5
19 003 201403 5
20 003 201404 5
21 003 201405 5
注意;这是为了 - I only have 5 years, 2013, 2014, 2015, 2016 and 2017
但可以很容易地在几周内延长 CTE
我的问题是关于在 bigquery 中滑动 window 总结。
我有一个 table 如下所示
run_id year_week value
001 201451 5
001 201452 8
001 201501 1
001 201505 5
003 201352 8
003 201401 1
003 201405 5
在这里,每年的周数可以从 01 到 53。例如,2014 年的最后一周是 201452,而 2015 年的最后一周是 201553。如果它能让生活更轻松,我只有 5 年,2013 , 2014, 2015, 2016 和 2017,只有 2015 年的周数达到 53。
现在,对于每个 运行,我正在尝试获取值的滑动 window 总和。每个 year_week
将假定当前 run_id(例如 001
)的下一个 5 year_week
(包括其自身)的值之和。例如,以下可能是当前 table
run_id year_week aggregate_sum
001 201451 5+8+1+0+0
001 201452 8+1+0+0+0
001 201501 1+0+0+0+5
001 201502 0+0+0+5+0
001 201503 0+0+5+0+0
001 201504 0+5+0+0+0
001 201505 5+0+0+0+0
003 201352 8+1+0+0+0
003 201401 1+0+0+0+5
003 201402 0+0+0+5+0
003 201403 0+0+5+0+0
003 201404 0+5+0+0+0
003 201405 5+0+0+0+0
为了解释发生了什么,201451 包括其自身在内的接下来 5 周将是 201451,201452,201501,201502,201503。如果在 table 中的那些周有当前 run_id
的值,我们只需将它们加起来,即 5+8+1+0+0,因为 [=13] 的相应值=] 如果不在 table.
是否可以在 bigquery 中使用滑动 window 操作来实现?
以下适用于 BigQuery 标准 SQL
#standardSQL
WITH weeks AS (
SELECT 100* year + week year_week
FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year,
UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
SELECT i.run_id, w.year_week, d.year_week week2, value
FROM weeks w
CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
LEFT JOIN `project.dataset.table` d
USING(year_week, run_id)
)
SELECT * FROM (
SELECT run_id, year_week,
SUM(value) OVER(win) aggregate_sum
FROM temp
WINDOW win AS (
PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
)
)
WHERE NOT aggregate_sum IS NULL
您可以使用问题中的虚拟数据测试/玩上面的内容,如下所示
#standardSQL
WITH `project.dataset.table` AS (
SELECT '001' run_id, 201451 year_week, 5 value UNION ALL
SELECT '001', 201452, 8 UNION ALL
SELECT '001', 201501, 1 UNION ALL
SELECT '001', 201505, 5
), weeks AS (
SELECT 100* year + week year_week
FROM UNNEST([2013, 2014, 2015, 2016, 2017]) year,
UNNEST(GENERATE_ARRAY(1, IF(EXTRACT(ISOWEEK FROM DATE(1+year,1,1)) = 1, 52, 53))) week
), temp AS (
SELECT i.run_id, w.year_week, d.year_week week2, value
FROM weeks w
CROSS JOIN (SELECT DISTINCT run_id FROM `project.dataset.table`) i
LEFT JOIN `project.dataset.table` d
USING(year_week, run_id)
)
SELECT * FROM (
SELECT run_id, year_week,
SUM(value) OVER(win) aggregate_sum
FROM temp
WINDOW win AS (
PARTITION BY run_id ORDER BY year_week ROWS BETWEEN CURRENT row AND 4 FOLLOWING
)
)
WHERE NOT aggregate_sum IS NULL
-- ORDER BY run_id, year_week
结果为
Row run_id year_week aggregate_sum
1 001 201447 5
2 001 201448 13
3 001 201449 14
4 001 201450 14
5 001 201451 14
6 001 201452 9
7 001 201501 6
8 001 201502 5
9 001 201503 5
10 001 201504 5
11 001 201505 5
12 003 201348 8
13 003 201349 9
14 003 201350 9
15 003 201351 9
16 003 201352 9
17 003 201401 6
18 003 201402 5
19 003 201403 5
20 003 201404 5
21 003 201405 5
注意;这是为了 - I only have 5 years, 2013, 2014, 2015, 2016 and 2017
但可以很容易地在几周内延长 CTE