SQLite 运行总计（但在不同 table 中键入值）

Question

我正在尝试生成运行总数，但需要不同 table 中每一行的最后一个值。在下面的示例中，我可以轻松地为 T 中的每个 time 值生成运行总数，但我希望每个 T 的运行总数P 中的时间值（而不是 T 中的每笔交易都获得价格值，这是微不足道的）：

给定 table 笔交易 T，例如：

user	hour	item	delta
Alice	1	A	1
Alice	1	A	2
Bob	2	A	2
Alice	3	A	1
Bob	3	B	1
Alice	5	A	-1
Bob	5	B	3

定价 table P 如：

hour	item	price
1	A	1.1
1	B	1.2
2	A	2.1
2	B	2.2
3	A	3.1
3	B	3.2
4	A	4.1
4	B	4.2
5	A	5.1
5	B	5.2

我想要 P 中每个 hour 的记录，其中 user 的运行总数不为零。类似于：

hour	item	price	user	running_total
1	A	1.1	Alice	3
2	A	2.1	Alice	3
2	A	2.1	Bob	2
3	A	3.1	Alice	4
3	A	3.1	Bob	2
3	B	3.2	Bob	1
4	A	4.1	Alice	4
4	A	4.1	Bob	2
4	B	4.2	Bob	1
5	A	5.1	Alice	3
5	A	5.1	Bob	2
5	B	5.2	Bob	4

我可以使用零或 nulls 而不是我删除的行（即在 Bob 有任何项目之前）。我遇到的关键问题是，对于商品有价格的每个小时，我想要每个用户的余额。

我目前正在非常愚蠢地做这件事，使用过程语言，遍历 P 中的所有 hour 值 - 但考虑到我认为我只是在寻找过滤后的笛卡尔积在 table 和运行之间 table，我认为一定有更好的方法。

我当前迭代定价的解决方案 table（定价中约 3K 行 table，交易中 10K 行 table）强制执行大约需要 250 毫秒。以下 SQL 似乎可以完成这项工作，但需要大约 25 秒，所以我希望有更好的方法：

with ranked_b as (
    select F.*, row_number() over (partition by p_hour, user, item order by hour desc) as rn
    from (select P.hour as p_hour, P.price, B.*  from P cross join (select distinct a.hour, a.user, a.item, sum(a.delta) over (partition by a.user, a.item order by a.hour) running_total from T a order by a.hour) B on P.item=B.item and B.hour<=P.hour  order by P_hour, B.user, B.item, B.hour) F
)  SELECT p_hour as hour, item, price, user, running_total from ranked_b where rn=1;

Answer 1

您的代码有 2 个suggestions/simplifications。

首先，子查询中没有 LIMIT 的 ORDER BY 子句完全没有用，除了降低查询性能外，不会影响最终结果。
因此，从 B 和 F 子查询中删除它们。

此外，尽管您使用了 ON 子句，但您正在执行 CROSS JOIN。
这等同于 INNER JOIN，这是你应该使用的，因为（来自 Simple Select Processing）：

The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators, but is handled differently by the query optimizer in that it prevents the query optimizer from reordering the tables in the join. An application programmer can use the CROSS JOIN operator to directly influence the algorithm that is chosen to implement the SELECT statement. Avoid using CROSS JOIN except in specific situations where manual control of the query optimizer is desired. Avoid using CROSS JOIN early in the development of an application as doing so is a premature optimization. The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL.

试试这个：

WITH ranked_b AS (
  SELECT F.*, ROW_NUMBER() OVER (PARTITION BY p_hour, user, item ORDER BY hour DESC) rn
  FROM (
    SELECT P.hour p_hour, P.price, B.*  
    FROM P 
    INNER JOIN (
      SELECT DISTINCT hour, user, item, 
             SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
      FROM T
    ) B ON P.item = B.item AND B.hour <= P.hour  
  ) F
)  
SELECT p_hour hour, item, price, user, running_total 
FROM ranked_b 
WHERE rn = 1;

或者，另一个使用 SQLite 裸列的版本：

SELECT p_hour hour, item, price, user, running_total
FROM (
  SELECT P.hour p_hour, P.price, B.*  
  FROM P 
  INNER JOIN (
    SELECT DISTINCT hour, user, item, 
           SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
    FROM T
  ) B ON P.item = B.item AND B.hour <= P.hour  
) F
GROUP BY p_hour, user, item
HAVING MAX(hour);

参见demo。

SQLite 运行总计（但在不同 table 中键入值）

SQLite Running Total (but keyed on values in a different table)

sql

sqlite

common-table-expression

greatest-n-per-group

window-functions

SQLite 运行 总计（但在不同 table 中键入值）

SQLite Running Total (but keyed on values in a different table)

sql

sqlite

common-table-expression

greatest-n-per-group

window-functions

SQLite 运行总计（但在不同 table 中键入值）