如何在 PostgreSQL 中移动 window argmax
how to get moving window argmax in PostgreSQL
我正在尝试使用 PostgreSQL 中的 window 函数查找数据库中列的移动 argmax。
这是我目前所拥有的:
select *,
(max(case when price = roll_max then (row_num) end) over (partition by roll_max order by s_date)) as argmax
from (
select s_id, s_date, price,
row_number() over (partition by s_id order by s_date) as row_num,
max(high_price) over (partition by s_id order by s_date rows 10 preceding) as roll_max
from sample_table
) tb1
order by s_date
以上代码修改自。我必须按 s_id
添加分区,因为有许多不同的 s_ids - table 的唯一键是:(s_id, s_date)
。所以,我需要所有可用日期的每一对的 argmax。
这是我得到的一些示例输出数据的输出(window 大小 10):
+-------+--------------+---------+---------+----------+------------------------------------------+
| s_id | s_date | price | row_num | roll_max | argmax |
+-------+--------------+---------+---------+----------+------------------------------------------+
| "ABC" | "2020-06-10" | 322.390 | 1 | 322.390 | 1 |
| "ABC" | "2020-06-11" | 312.150 | 2 | 322.390 | 1 |
| "ABC" | "2020-06-12" | 309.080 | 3 | 322.390 | 1 |
| "ABC" | "2020-06-15" | 308.280 | 4 | 322.390 | 1 |
| "ABC" | "2020-06-16" | 315.640 | 5 | 322.390 | 1 |
| "ABC" | "2020-06-17" | 314.390 | 6 | 322.390 | 1 |
| "ABC" | "2020-06-18" | 312.300 | 7 | 322.390 | 1 |
| "ABC" | "2020-06-19" | 314.380 | 8 | 322.390 | 1 |
| "ABC" | "2020-06-22" | 311.050 | 9 | 322.390 | 1 |
| "ABC" | "2020-06-23" | 314.500 | 10 | 322.390 | 1 |
| "ABC" | "2020-06-24" | 310.510 | 11 | 322.390 | 1 |
| "ABC" | "2020-06-25" | 307.640 | 12 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-26" | 306.390 | 13 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-29" | 304.610 | 14 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-30" | 310.200 | 15 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-07-01" | 311.890 | 16 | 314.500 | NULL /* how to get row_num (10) here? */ |
| "ABC" | "2020-07-02" | 315.700 | 17 | 315.700 | 17 |
| "ABC" | "2020-07-06" | 317.680 | 18 | 317.680 | 18 |
+-------+--------------+---------+---------+----------+------------------------------------------+
我知道我上面写的查询只匹配当前行和最大值,如果匹配,returns 行号 - 但这种情况并不总是适用,如 table 以上,其中 315.640 是滚动最大值,直到(包括)第 12 行,但该值来自前一个 window 而不是当前行。
我的问题是:如何在上面的示例中获取值 5 代替 NULL - 即获取实际 argmax
的 row_num(315.640 的 row_num
是 5) 对于 argmax
- row_num
的每个值可以是 table 或每个 window (在这个例子中 window 大小是 10)。
我已经查看了 similar 个问题,但仍然无法得到我想要的结果,因为我想做的是滚动 argmax
而不是整个列table.
任何人都可以为此提出解决方案吗?我也愿意使用 UDF。我只有聚合 UDF 的基本知识,所以我使用临时数组来保存最后 10 个值并取其最大值的方法似乎效率不高(甚至不确定我是否执行这样的数组函数)并且我'我现在没主意了:/
虽然有点难读,但您可以执行以下操作:
- 将此 window 中的所有价格值放入一个数组中;
- 使用
array_position
求滚动最高价的值;
- 通过将
row_number() - 10
(window 大小)添加到输出来针对 row_number()
进行调整;
- 使用
GREATEST(row_number() - 10, 0)
调整数组的开头以防止出现负数:
WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
GREATEST(row_number() over (PARTITION BY s_id ORDER BY s_date) - 10, 0)
+ array_position(
array_agg(price) over (partition by s_id order by s_date rows 10 preceding),
max(price) over (partition by s_id order by s_date rows 10 preceding)
) as argmax
FROM sample_table
或者,使用子查询,但更易于阅读:
WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id, s_date, price, row_number, roll_max,
GREATEST(row_number - 10, 0)
+ array_position(
prices,
roll_max
) as argmax
FROM (
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
array_agg(price)
over (partition by s_id order by s_date rows 10 preceding) as prices
FROM sample_table
) as s
我正在尝试使用 PostgreSQL 中的 window 函数查找数据库中列的移动 argmax。 这是我目前所拥有的:
select *,
(max(case when price = roll_max then (row_num) end) over (partition by roll_max order by s_date)) as argmax
from (
select s_id, s_date, price,
row_number() over (partition by s_id order by s_date) as row_num,
max(high_price) over (partition by s_id order by s_date rows 10 preceding) as roll_max
from sample_table
) tb1
order by s_date
以上代码修改自s_id
添加分区,因为有许多不同的 s_ids - table 的唯一键是:(s_id, s_date)
。所以,我需要所有可用日期的每一对的 argmax。
这是我得到的一些示例输出数据的输出(window 大小 10):
+-------+--------------+---------+---------+----------+------------------------------------------+
| s_id | s_date | price | row_num | roll_max | argmax |
+-------+--------------+---------+---------+----------+------------------------------------------+
| "ABC" | "2020-06-10" | 322.390 | 1 | 322.390 | 1 |
| "ABC" | "2020-06-11" | 312.150 | 2 | 322.390 | 1 |
| "ABC" | "2020-06-12" | 309.080 | 3 | 322.390 | 1 |
| "ABC" | "2020-06-15" | 308.280 | 4 | 322.390 | 1 |
| "ABC" | "2020-06-16" | 315.640 | 5 | 322.390 | 1 |
| "ABC" | "2020-06-17" | 314.390 | 6 | 322.390 | 1 |
| "ABC" | "2020-06-18" | 312.300 | 7 | 322.390 | 1 |
| "ABC" | "2020-06-19" | 314.380 | 8 | 322.390 | 1 |
| "ABC" | "2020-06-22" | 311.050 | 9 | 322.390 | 1 |
| "ABC" | "2020-06-23" | 314.500 | 10 | 322.390 | 1 |
| "ABC" | "2020-06-24" | 310.510 | 11 | 322.390 | 1 |
| "ABC" | "2020-06-25" | 307.640 | 12 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-26" | 306.390 | 13 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-29" | 304.610 | 14 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-06-30" | 310.200 | 15 | 315.640 | NULL /* how to get row_num (5) here? */ |
| "ABC" | "2020-07-01" | 311.890 | 16 | 314.500 | NULL /* how to get row_num (10) here? */ |
| "ABC" | "2020-07-02" | 315.700 | 17 | 315.700 | 17 |
| "ABC" | "2020-07-06" | 317.680 | 18 | 317.680 | 18 |
+-------+--------------+---------+---------+----------+------------------------------------------+
我知道我上面写的查询只匹配当前行和最大值,如果匹配,returns 行号 - 但这种情况并不总是适用,如 table 以上,其中 315.640 是滚动最大值,直到(包括)第 12 行,但该值来自前一个 window 而不是当前行。
我的问题是:如何在上面的示例中获取值 5 代替 NULL - 即获取实际 argmax
的 row_num(315.640 的 row_num
是 5) 对于 argmax
- row_num
的每个值可以是 table 或每个 window (在这个例子中 window 大小是 10)。
我已经查看了 argmax
而不是整个列table.
任何人都可以为此提出解决方案吗?我也愿意使用 UDF。我只有聚合 UDF 的基本知识,所以我使用临时数组来保存最后 10 个值并取其最大值的方法似乎效率不高(甚至不确定我是否执行这样的数组函数)并且我'我现在没主意了:/
虽然有点难读,但您可以执行以下操作:
- 将此 window 中的所有价格值放入一个数组中;
- 使用
array_position
求滚动最高价的值; - 通过将
row_number() - 10
(window 大小)添加到输出来针对row_number()
进行调整; - 使用
GREATEST(row_number() - 10, 0)
调整数组的开头以防止出现负数:
WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
GREATEST(row_number() over (PARTITION BY s_id ORDER BY s_date) - 10, 0)
+ array_position(
array_agg(price) over (partition by s_id order by s_date rows 10 preceding),
max(price) over (partition by s_id order by s_date rows 10 preceding)
) as argmax
FROM sample_table
或者,使用子查询,但更易于阅读:
WITH sample_table(s_id, s_date, price) AS (
VALUES ('ABC', '2020-06-10'::date, 322.390),
('ABC', '2020-06-11'::date, 312.150),
('ABC', '2020-06-12'::date, 309.080),
('ABC', '2020-06-15'::date, 308.280),
('ABC', '2020-06-16'::date, 315.640),
('ABC', '2020-06-17'::date, 314.390),
('ABC', '2020-06-18'::date, 312.300),
('ABC', '2020-06-19'::date, 314.380),
('ABC', '2020-06-22'::date, 311.050),
('ABC', '2020-06-23'::date, 314.500),
('ABC', '2020-06-24'::date, 310.510),
('ABC', '2020-06-25'::date, 307.640),
('ABC', '2020-06-26'::date, 306.390),
('ABC', '2020-06-29'::date, 304.610),
('ABC', '2020-06-30'::date, 310.200),
('ABC', '2020-07-01'::date, 311.890),
('ABC', '2020-07-02'::date, 315.700),
('ABC', '2020-07-06'::date, 317.680)
)
SELECT s_id, s_date, price, row_number, roll_max,
GREATEST(row_number - 10, 0)
+ array_position(
prices,
roll_max
) as argmax
FROM (
SELECT s_id,
s_date,
price,
row_number() over (PARTITION BY s_id ORDER BY s_date),
max(price) over (partition by s_id order by s_date rows 10 preceding) as roll_max,
array_agg(price)
over (partition by s_id order by s_date rows 10 preceding) as prices
FROM sample_table
) as s