SQLite 运行 总计(但在不同 table 中键入值)

SQLite Running Total (but keyed on values in a different table)

我正在尝试生成 运行 总数,但需要不同 table 中每一行的最后一个值。在下面的示例中,我可以轻松地为 T 中的每个 time 值生成 运行 总数,但我希望每个 T 的 运行 总数P 中的时间值(而不是 T 中的每笔交易都获得价格值,这是微不足道的):

给定 table 笔交易 T,例如:

user hour item delta
Alice 1 A 1
Alice 1 A 2
Bob 2 A 2
Alice 3 A 1
Bob 3 B 1
Alice 5 A -1
Bob 5 B 3

定价 table P 如:

hour item price
1 A 1.1
1 B 1.2
2 A 2.1
2 B 2.2
3 A 3.1
3 B 3.2
4 A 4.1
4 B 4.2
5 A 5.1
5 B 5.2

我想要 P 中每个 hour 的记录,其中 user 的 运行 总数不为零。类似于:

hour item price user running_total
1 A 1.1 Alice 3
2 A 2.1 Alice 3
2 A 2.1 Bob 2
3 A 3.1 Alice 4
3 A 3.1 Bob 2
3 B 3.2 Bob 1
4 A 4.1 Alice 4
4 A 4.1 Bob 2
4 B 4.2 Bob 1
5 A 5.1 Alice 3
5 A 5.1 Bob 2
5 B 5.2 Bob 4

我可以使用零或 nulls 而不是我删除的行(即在 Bob 有任何项目之前)。我遇到的关键问题是,对于商品有价格的每个小时,我想要每个用户的余额。

我目前正在非常愚蠢地做这件事,使用过程语言,遍历 P 中的所有 hour 值 - 但考虑到我认为我只是在寻找过滤后的笛卡尔积在 table 和 运行 之间 table,我认为一定有更好的方法。

我当前迭代定价的解决方案 table(定价中约 3K 行 table,交易中 10K 行 table)强制执行大约需要 250 毫秒。以下 SQL 似乎可以完成这项工作,但需要大约 25 秒,所以我希望有更好的方法:

with ranked_b as (
    select F.*, row_number() over (partition by p_hour, user, item order by hour desc) as rn
    from (select P.hour as p_hour, P.price, B.*  from P cross join (select distinct a.hour, a.user, a.item, sum(a.delta) over (partition by a.user, a.item order by a.hour) running_total from T a order by a.hour) B on P.item=B.item and B.hour<=P.hour  order by P_hour, B.user, B.item, B.hour) F
)  SELECT p_hour as hour, item, price, user, running_total from ranked_b where rn=1;

您的代码有 2 个suggestions/simplifications。

首先,子查询中没有 LIMITORDER BY 子句完全没有用,除了降低查询性能外,不会影响最终结果。
因此,从 B 和 F 子查询中删除它们。

此外,尽管您使用了 ON 子句,但您正在执行 CROSS JOIN
这等同于 INNER JOIN,这是你应该使用的,因为(来自 Simple Select Processing):

The "CROSS JOIN" join operator produces the same result as the "INNER JOIN", "JOIN" and "," operators, but is handled differently by the query optimizer in that it prevents the query optimizer from reordering the tables in the join. An application programmer can use the CROSS JOIN operator to directly influence the algorithm that is chosen to implement the SELECT statement. Avoid using CROSS JOIN except in specific situations where manual control of the query optimizer is desired. Avoid using CROSS JOIN early in the development of an application as doing so is a premature optimization. The special handling of CROSS JOIN is an SQLite-specific feature and is not a part of standard SQL.

试试这个:

WITH ranked_b AS (
  SELECT F.*, ROW_NUMBER() OVER (PARTITION BY p_hour, user, item ORDER BY hour DESC) rn
  FROM (
    SELECT P.hour p_hour, P.price, B.*  
    FROM P 
    INNER JOIN (
      SELECT DISTINCT hour, user, item, 
             SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
      FROM T
    ) B ON P.item = B.item AND B.hour <= P.hour  
  ) F
)  
SELECT p_hour hour, item, price, user, running_total 
FROM ranked_b 
WHERE rn = 1;

或者,另一个使用 SQLite 裸列的版本:

SELECT p_hour hour, item, price, user, running_total
FROM (
  SELECT P.hour p_hour, P.price, B.*  
  FROM P 
  INNER JOIN (
    SELECT DISTINCT hour, user, item, 
           SUM(delta) OVER (PARTITION BY user, item ORDER BY hour) running_total 
    FROM T
  ) B ON P.item = B.item AND B.hour <= P.hour  
) F
GROUP BY p_hour, user, item
HAVING MAX(hour);

参见demo