SQLite Window 函数

SQLite Window functions

这是我的数据库的简化 ER 图:

我想要检索的是,对于每个 vendor_item:

这是 PRICE_DATA table 的一些示例数据,可以让您了解一下:

vendor_item_id capture_ts price
124 2022-03-02 09:00:12.851043 46.78
124 2022-03-02 14:07:49.423343 42.99
124 2022-03-04 08:20:07.636140 43.99
124 2022-03-05 08:29:20.421764 42.99
124 2022-03-08 08:33:59.043372 42.99
129 2022-03-02 08:55:14.401816 21.52
129 2022-03-02 14:11:20.544427 25.54
129 2022-03-04 08:24:06.976667 25.72
129 2022-03-08 08:22:46.734662 30.83
132 2022-03-02 09:04:18.144494 41.99
132 2022-03-03 08:29:15.981712 42.99
132 2022-03-04 08:27:39.327779 41.99
132 2022-03-07 08:29:41.236009 42.99
132 2022-03-08 08:27:44.318570 40.99

这是我目前的 SQL 声明:

select distinct vendor_item_id
      ,last_value(price) over win as curr_price
      ,min(price) over win as low_price
      ,max(price) over win as high_price
from price_data
window win as (partition by vendor_item_id 
               order by capture_ts 
               rows between unbounded preceding 
                        and unbounded following);

虽然这或多或少地提供了我正在寻找的东西,但有几个问题:

想要的结果:

vendor_item_id curr_price low_price high_price
124 42.99 42.99 46.78
129 30.83 21.52 25.72
132 40.99 41.99 42.99

感谢您的帮助!

使用 returns 每个 vendor_item_id 的最大值 capture_ts 的 CTE,然后通过条件聚合获得 low_pricehigh_price

WITH cte AS (
  SELECT *, MAX(capture_ts) OVER (PARTITION BY vendor_item_id) max_capture_ts
  FROM price_data
)
SELECT DISTINCT vendor_item_id,
       FIRST_VALUE(price) OVER (PARTITION BY vendor_item_id ORDER BY capture_ts DESC) curr_price,
       MIN(CASE WHEN capture_ts < max_capture_ts THEN price END) OVER (PARTITION BY vendor_item_id) low_price, 
       MAX(CASE WHEN capture_ts < max_capture_ts THEN price END) OVER (PARTITION BY vendor_item_id) high_price
FROM cte;

参见demo

我最终使用 CTE 和常规聚合函数来解决问题:

with v_last_capture as (
    select vendor_item_id
          ,max(capture_ts) last_capture_ts
      from price_data pd
     group by vendor_item_id
)
, v_curr_price as (
    select pd.*
      from price_data pd 
      inner join v_last_capture vc 
            on (pd.vendor_item_id = vc.vendor_item_id and 
                pd.capture_ts = vc.last_capture_ts)
)
, v_other_prices as (
    select vendor_item_id
          ,min(pd.price) as min_price
          ,max(pd.price) as max_price
      from price_data pd
     where id not in (select id from v_curr_price)
     group by vendor_item_id 
)
select vc.id
      ,vc.vendor_item_id 
      ,vc.price as curr_price 
      ,vc.stock
      ,vo.min_price
      ,vo.max_price 
  from v_curr_price vc 
  left join v_other_prices vo on (vc.vendor_item_id = vo.vendor_item_id)

解释计划:

QUERY PLAN
|--MATERIALIZE 4
|  |--SCAN TABLE price_data AS pd
|  `--USE TEMP B-TREE FOR GROUP BY
|--MATERIALIZE 5
|  |--SCAN TABLE price_data AS pd
|  |--LIST SUBQUERY 6
|  |  |--MATERIALIZE 8
|  |  |  |--SCAN TABLE price_data AS pd
|  |  |  `--USE TEMP B-TREE FOR GROUP BY
|  |  |--SCAN SUBQUERY 8 AS vc
|  |  `--SEARCH TABLE price_data AS pd USING AUTOMATIC COVERING INDEX (vendor_item_id=? AND capture_ts=?)
|  `--USE TEMP B-TREE FOR GROUP BY
|--SCAN TABLE price_data AS pd
|--SEARCH SUBQUERY 4 AS vc USING AUTOMATIC COVERING INDEX (vendor_item_id=?)
`--SEARCH SUBQUERY 5 AS vo USING AUTOMATIC COVERING INDEX (vendor_item_id=?)

的答案同样有效(而且查询更简洁)。这是他的查询的解释计划:

QUERY PLAN
|--CO-ROUTINE 3
|  |--CO-ROUTINE 4
|  |  |--CO-ROUTINE 1
|  |  |  |--CO-ROUTINE 5
|  |  |  |  |--SCAN TABLE price_data
|  |  |  |  `--USE TEMP B-TREE FOR ORDER BY
|  |  |  `--SCAN SUBQUERY 5
|  |  |--SCAN SUBQUERY 1
|  |  `--USE TEMP B-TREE FOR ORDER BY
|  |--SCAN SUBQUERY 4
|  `--USE TEMP B-TREE FOR ORDER BY
|--SCAN SUBQUERY 3
`--USE TEMP B-TREE FOR DISTINCT

您可以使用 window filters 删除满足“最新捕获除外”要求的最后一行

select distinct
    p.vendor_item_id
    ,last_value(p.price) over vendor_item as curr_price
    ,min(price) filter (where p.capture_ts < latest.capture_ts) over vendor_item as low_price
    ,max(price) filter (where p.capture_ts < latest.capture_ts) over vendor_item as high_price
from
    price_data p
    inner join (
        select vendor_item_id, max(capture_ts) capture_ts from price_data group by vendor_item_id
    ) latest on latest.vendor_item_id = p.vendor_item_id
window
    vendor_item as (
        partition by p.vendor_item_id
        order by p.capture_ts 
        rows between unbounded preceding and unbounded following
    );

结果

124 42.99   42.99   46.78
129 30.83   21.52   25.72
132 40.99   41.99   42.99

我想 capture_ts 对于 vendor_item_id 是唯一的,否则你必须创建一个更智能的过滤器。

裸查询计划 price_data table 未定义索引:

QUERY PLAN
|--CO-ROUTINE 3
|  |--MATERIALIZE 1
|  |  |--SCAN TABLE price_data
|  |  `--USE TEMP B-TREE FOR GROUP BY
|  |--SCAN TABLE price_data AS p
|  |--SEARCH SUBQUERY 1 AS latest USING AUTOMATIC COVERING INDEX (vendor_item_id=?)
|  `--USE TEMP B-TREE FOR ORDER BY
|--SCAN SUBQUERY 3
`--USE TEMP B-TREE FOR DISTINCT

定义覆盖索引 (create index ix_price_data on price_data (vendor_item_id, capture_ts, price)) 后,事情会变得简单一点:

QUERY PLAN
|--CO-ROUTINE 3
|  |--MATERIALIZE 1
|  |  `--SCAN TABLE price_data USING COVERING INDEX ix_price_data
|  |--SCAN SUBQUERY 1 AS latest
|  |--SEARCH TABLE price_data AS p USING COVERING INDEX ix_price_data (vendor_item_id=?)
|  `--USE TEMP B-TREE FOR ORDER BY
|--SCAN SUBQUERY 3
`--USE TEMP B-TREE FOR DISTINCT

由于覆盖索引会增加数据库大小(毕竟所有数据都作为索引中的副本存在),您可以决定要 re-create price_data 作为集群索引,即创建 table WITHOUT ROWID 并将 vendor_item_id, capture_ts 标记为主键。您也可以删除 then-useless id 列。

这样您将获得与显式索引相同的性能,但不会增加数据库的大小(实际上 table 应该明显变小,因为 row_id 消失了).查询计划保持不变。