如何在 PostgreSQL 中移动 window argmax

how to get moving window argmax in PostgreSQL

我正在尝试使用 PostgreSQL 中的 window 函数查找数据库中列的移动 argmax。 这是我目前所拥有的:

select *,
(max(case when price = roll_max then (row_num) end) over (partition by roll_max order by s_date)) as argmax
from (
   select s_id, s_date, price, 
   row_number() over (partition by s_id order by s_date) as row_num,
   max(high_price) over (partition by s_id order by s_date rows 10 preceding) as roll_max
   from sample_table
) tb1
order by s_date

以上代码修改自。我必须按 s_id 添加分区,因为有许多不同的 s_ids - table 的唯一键是:(s_id, s_date)。所以,我需要所有可用日期的每一对的 argmax。

这是我得到的一些示例输出数据的输出(window 大小 10):

+-------+--------------+---------+---------+----------+------------------------------------------+
| s_id  |    s_date    |  price  | row_num | roll_max |                  argmax                  |
+-------+--------------+---------+---------+----------+------------------------------------------+
| "ABC" | "2020-06-10" | 322.390 |       1 |  322.390 | 1                                        |
| "ABC" | "2020-06-11" | 312.150 |       2 |  322.390 | 1                                        |
| "ABC" | "2020-06-12" | 309.080 |       3 |  322.390 | 1                                        |
| "ABC" | "2020-06-15" | 308.280 |       4 |  322.390 | 1                                        |
| "ABC" | "2020-06-16" | 315.640 |       5 |  322.390 | 1                                        |
| "ABC" | "2020-06-17" | 314.390 |       6 |  322.390 | 1                                        |
| "ABC" | "2020-06-18" | 312.300 |       7 |  322.390 | 1                                        |
| "ABC" | "2020-06-19" | 314.380 |       8 |  322.390 | 1                                        |
| "ABC" | "2020-06-22" | 311.050 |       9 |  322.390 | 1                                        |
| "ABC" | "2020-06-23" | 314.500 |      10 |  322.390 | 1                                        |
| "ABC" | "2020-06-24" | 310.510 |      11 |  322.390 | 1                                        |
| "ABC" | "2020-06-25" | 307.640 |      12 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-26" | 306.390 |      13 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-29" | 304.610 |      14 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-06-30" | 310.200 |      15 |  315.640 | NULL /* how to get row_num (5) here? */  |
| "ABC" | "2020-07-01" | 311.890 |      16 |  314.500 | NULL /* how to get row_num (10) here? */ |
| "ABC" | "2020-07-02" | 315.700 |      17 |  315.700 | 17                                       |
| "ABC" | "2020-07-06" | 317.680 |      18 |  317.680 | 18                                       |
+-------+--------------+---------+---------+----------+------------------------------------------+

我知道我上面写的查询只匹配当前行和最大值,如果匹配,returns 行号 - 但这种情况并不总是适用,如 table 以上,其中 315.640 是滚动最大值,直到(包括)第 12 行,但该值来自前一个 window 而不是当前行。

我的问题是:如何在上面的示例中获取值 5 代替 NULL - 即获取实际 argmax 的 row_num(315.640 的 row_num 是 5) 对于 argmax - row_num 的每个值可以是 table 或每个 window (在这个例子中 window 大小是 10)。

我已经查看了 similar 个问题,但仍然无法得到我想要的结果,因为我想做的是滚动 argmax 而不是整个列table.

任何人都可以为此提出解决方案吗?我也愿意使用 UDF。我只有聚合 UDF 的基本知识,所以我使用临时数组来保存最后 10 个值并取其最大值的方法似乎效率不高(甚至不确定我是否执行这样的数组函数)并且我'我现在没主意了:/

虽然有点难读,但您可以执行以下操作:

  1. 将此 window 中的所有价格值放入一个数组中;
  2. 使用array_position求滚动最高价的值;
  3. 通过将 row_number() - 10(window 大小)添加到输出来针对 row_number() 进行调整;
  4. 使用GREATEST(row_number() - 10, 0)调整数组的开头以防止出现负数:
WITH sample_table(s_id, s_date, price) AS (
    VALUES ('ABC', '2020-06-10'::date, 322.390),
           ('ABC', '2020-06-11'::date, 312.150),
           ('ABC', '2020-06-12'::date, 309.080),
           ('ABC', '2020-06-15'::date, 308.280),
           ('ABC', '2020-06-16'::date, 315.640),
           ('ABC', '2020-06-17'::date, 314.390),
           ('ABC', '2020-06-18'::date, 312.300),
           ('ABC', '2020-06-19'::date, 314.380),
           ('ABC', '2020-06-22'::date, 311.050),
           ('ABC', '2020-06-23'::date, 314.500),
           ('ABC', '2020-06-24'::date, 310.510),
           ('ABC', '2020-06-25'::date, 307.640),
           ('ABC', '2020-06-26'::date, 306.390),
           ('ABC', '2020-06-29'::date, 304.610),
           ('ABC', '2020-06-30'::date, 310.200),
           ('ABC', '2020-07-01'::date, 311.890),
           ('ABC', '2020-07-02'::date, 315.700),
           ('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id,
       s_date,
       price,
       row_number() over (PARTITION BY s_id ORDER BY s_date),
       max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
       GREATEST(row_number() over (PARTITION BY s_id ORDER BY s_date) - 10, 0)
           + array_position(
                       array_agg(price) over (partition by s_id order by s_date rows 10 preceding),
                       max(price) over (partition by s_id order by s_date rows 10 preceding)
           ) as argmax
FROM sample_table

或者,使用子查询,但更易于阅读:

WITH sample_table(s_id, s_date, price) AS (
    VALUES ('ABC', '2020-06-10'::date, 322.390),
           ('ABC', '2020-06-11'::date, 312.150),
           ('ABC', '2020-06-12'::date, 309.080),
           ('ABC', '2020-06-15'::date, 308.280),
           ('ABC', '2020-06-16'::date, 315.640),
           ('ABC', '2020-06-17'::date, 314.390),
           ('ABC', '2020-06-18'::date, 312.300),
           ('ABC', '2020-06-19'::date, 314.380),
           ('ABC', '2020-06-22'::date, 311.050),
           ('ABC', '2020-06-23'::date, 314.500),
           ('ABC', '2020-06-24'::date, 310.510),
           ('ABC', '2020-06-25'::date, 307.640),
           ('ABC', '2020-06-26'::date, 306.390),
           ('ABC', '2020-06-29'::date, 304.610),
           ('ABC', '2020-06-30'::date, 310.200),
           ('ABC', '2020-07-01'::date, 311.890),
           ('ABC', '2020-07-02'::date, 315.700),
           ('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id, s_date, price, row_number, roll_max,
       GREATEST(row_number - 10, 0)
           + array_position(
               prices,
               roll_max
           ) as argmax
FROM (
         SELECT s_id,
                s_date,
                price,
                row_number() over (PARTITION BY s_id ORDER BY s_date),
                max(price) over (partition by s_id order by s_date rows 10 preceding)       as roll_max,
                array_agg(price)
                over (partition by s_id order by s_date rows 10 preceding)                  as prices
         FROM sample_table
     ) as s