BigQuery/SQL:次要 table 指示的间隔总和

BigQuery/SQL: Sum over intervals indicated by a secondary table

假设我有两个 table:intervals 包含索引区间(其列为 i_mini_max)并且 values 包含索引值(列 ix)。这是一个例子:

 values:        intervals:
+---+---+   +-------+-------+
| i | x |   | i_min | i_max |
+-------+   +---------------+
| 1 | 1 |   |   1   |   4   |
| 2 | 0 |   |   6   |   6   | 
| 3 | 4 |   |   6   |   6   | 
| 4 | 9 |   |   6   |   6   |
| 6 | 7 |   |   7   |   9   |
| 7 | 2 |   |  12   |  17   |
| 8 | 2 |   +-------+-------+
| 9 | 2 |
+---+---+

我想对每个区间的 x 值求和:

       result:
+-------+-------+-----+
| i_min | i_max | sum | 
+---------------------+ 
|   1   |   4   |  13 | // 1+0+4+9
|   6   |   6   |   7 | 
|   6   |   6   |   7 | 
|   6   |   6   |   7 | 
|   7   |   9   |   6 | // 2+2+2
|  12   |  17   |   0 |
+-------+-------+-----+

在一些 SQL 引擎中,这可以通过以下方式完成:

SELECT
  i_min,
  i_max,
  (SELECT SUM(x)
   FROM values 
   WHERE i BETWEEN intervals.i_min AND intervals.i_max) AS sum_x
FROM
  intervals

除了 BigQuery 不允许的查询类型("Subselect not allowed in SELECT clause." 或 "LEFT OUTER JOIN cannot be used without a condition that is an equality of fields from both sides of the join." 取决于所使用的语法)。

一定有办法用 window 函数来做到这一点,但我不知道怎么做 — 我见过的所有示例都将分区作为 table 的一部分。有没有不使用CROSS JOIN的选项?如果不是,执行此 CROSS JOIN 的最有效方法是什么?

关于我的数据的一些注释:

尝试以下 - BigQuery 标准 SQL

#standardSQL
SELECT
  i_min, i_max, SUM(x) AS  sum_x
FROM (
  SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM `project.dataset.intervals`
) AS intervals
JOIN (SELECT i, x FROM `project.dataset.values` UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL 
GROUP BY i_min, i_max, line
-- ORDER BY i_min

您可以play/test使用如下虚拟数据

#standardSQL
WITH intervals AS (
  SELECT  1 AS i_min, 4 AS i_max UNION ALL
  SELECT  6, 6 UNION ALL
  SELECT  6, 6 UNION ALL
  SELECT  6, 6 UNION ALL
  SELECT  7, 9 UNION ALL
  SELECT 12, 17 
),
values AS (
  SELECT 1 AS i, 1 AS x UNION ALL
  SELECT 2, 0 UNION ALL
  SELECT 3, 4 UNION ALL
  SELECT 4, 9 UNION ALL
  SELECT 6, 7 UNION ALL
  SELECT 7, 2 UNION ALL
  SELECT 8, 2 UNION ALL
  SELECT 9, 2 
)
SELECT
  i_min, i_max, SUM(x) AS  sum_x
FROM (SELECT i_min, i_max, ROW_NUMBER() OVER() AS line FROM intervals) AS intervals
JOIN (SELECT i, x FROM values UNION ALL SELECT NULL, 0) AS values
ON values.i BETWEEN intervals.i_min AND intervals.i_max OR values.i IS NULL 
GROUP BY i_min, i_max, line
-- ORDER BY i_min