在 Presto SQL 的 where 子句中使用 Max()
Using Max() in where clause in Presto SQL
我有以下 table.
ID
Desc
progress
updated_time
1
abcd
planned
2022-04-20 10:00AM
1
abcd
planned
2022-04-25 12:00AM
1
abcd
in progress
2022-04-26 4:00PM
1
abcd
in progress
2022-05-04 11:00AM
1
abcd
in progress
2022-05-06 12:00PM
我只想 return 具有最新 updated_time 的行,不管它的进度如何,即,
ID
Desc
progress
updated_time
1
abcd
in progress
2022-05-06 12:00PM
我知道如果我按 'progress' 分组(如下所示),我也会得到一个我不需要的计划。我只需要每个 ID 的一行及其最新更新时间。
我写了下面的查询,
select ID,desc,progress,updated_time
from t1
where updated_time IN (select ID, desc, progress, max(updated_time)
from t1 group by 1,2,3)
我也收到以下错误,
'Multiple columns returned by subquery are not yet supported'
您正在尝试将单个值与多个列匹配,但 yhis 引发错误..
看着你为你的目标编写代码,而不是基于子查询的 IN 子句,你应该使用内部连接
select ID,desc,progress,updated_time
from t1
INNER JOIN
( select ID, desc, progress, max(updated_time) max_time
from t1 group by 1,2,3) t on t.max_time = t1.updated_time
我可能会为此使用 row_number 或其他一些排名函数。
with t as (select a.*,
row_number() over (partition by id order by updated_time desc as rn)
select * from t where rn = 1
在子查询中选择多个值将不起作用,您需要 select 单个值 scalar subquery:
-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
VALUES
(1, 'abcd', 'planned', timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned', timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-06 12:00')
)
--query
select id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)
或使用 max
window 函数和 subselect 的类似方法:
--query
select id, Desc, progress, updated_time
from (
select *, max(updated_time) over (partition by id) max_time
from dataset
)
where max_time = updated_time
或者只使用 row_number
:
select id, Desc, progress, updated_time
from
(
select *,
row_number() over(partition by id order by updated_time desc) rank
from dataset
)
where rank = 1
输出:
id
Desc
progress
updated_time
1
abcd
in progress
2022-05-07 12:00:00.000
2
abcd
in progress
2022-05-06 12:00:00.000
我有以下 table.
ID | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | planned | 2022-04-20 10:00AM |
1 | abcd | planned | 2022-04-25 12:00AM |
1 | abcd | in progress | 2022-04-26 4:00PM |
1 | abcd | in progress | 2022-05-04 11:00AM |
1 | abcd | in progress | 2022-05-06 12:00PM |
我只想 return 具有最新 updated_time 的行,不管它的进度如何,即,
ID | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | in progress | 2022-05-06 12:00PM |
我知道如果我按 'progress' 分组(如下所示),我也会得到一个我不需要的计划。我只需要每个 ID 的一行及其最新更新时间。
我写了下面的查询,
select ID,desc,progress,updated_time
from t1
where updated_time IN (select ID, desc, progress, max(updated_time)
from t1 group by 1,2,3)
我也收到以下错误, 'Multiple columns returned by subquery are not yet supported'
您正在尝试将单个值与多个列匹配,但 yhis 引发错误..
看着你为你的目标编写代码,而不是基于子查询的 IN 子句,你应该使用内部连接
select ID,desc,progress,updated_time
from t1
INNER JOIN
( select ID, desc, progress, max(updated_time) max_time
from t1 group by 1,2,3) t on t.max_time = t1.updated_time
我可能会为此使用 row_number 或其他一些排名函数。
with t as (select a.*,
row_number() over (partition by id order by updated_time desc as rn)
select * from t where rn = 1
在子查询中选择多个值将不起作用,您需要 select 单个值 scalar subquery:
-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
VALUES
(1, 'abcd', 'planned', timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned', timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress', timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress', timestamp '2022-05-06 12:00')
)
--query
select id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)
或使用 max
window 函数和 subselect 的类似方法:
--query
select id, Desc, progress, updated_time
from (
select *, max(updated_time) over (partition by id) max_time
from dataset
)
where max_time = updated_time
或者只使用 row_number
:
select id, Desc, progress, updated_time
from
(
select *,
row_number() over(partition by id order by updated_time desc) rank
from dataset
)
where rank = 1
输出:
id | Desc | progress | updated_time |
---|---|---|---|
1 | abcd | in progress | 2022-05-07 12:00:00.000 |
2 | abcd | in progress | 2022-05-06 12:00:00.000 |