在 Presto SQL 的 where 子句中使用 Max()

Question

我有以下 table.

ID	Desc	progress	updated_time
1	abcd	planned	2022-04-20 10:00AM
1	abcd	planned	2022-04-25 12:00AM
1	abcd	in progress	2022-04-26 4:00PM
1	abcd	in progress	2022-05-04 11:00AM
1	abcd	in progress	2022-05-06 12:00PM

我只想 return 具有最新 updated_time 的行，不管它的进度如何，即，

ID	Desc	progress	updated_time
1	abcd	in progress	2022-05-06 12:00PM

我知道如果我按 'progress' 分组（如下所示），我也会得到一个我不需要的计划。我只需要每个 ID 的一行及其最新更新时间。

我写了下面的查询，

select ID,desc,progress,updated_time 
from t1 
where updated_time IN (select ID, desc, progress, max(updated_time) 
from t1 group by 1,2,3)

我也收到以下错误， 'Multiple columns returned by subquery are not yet supported'

Answer 1

您正在尝试将单个值与多个列匹配，但 yhis 引发错误..

看着你为你的目标编写代码，而不是基于子查询的 IN 子句，你应该使用内部连接

select ID,desc,progress,updated_time 
from t1 
INNER JOIN 
( select ID, desc, progress, max(updated_time) max_time 
from t1 group by 1,2,3) t on t.max_time = t1.updated_time

Answer 2

我可能会为此使用 row_number 或其他一些排名函数。

with t as (select a.*,
 row_number() over (partition by id order by updated_time desc as rn) 
select * from t where rn = 1

Answer 3

在子查询中选择多个值将不起作用，您需要 select 单个值 scalar subquery:

-- sample data
WITH dataset (ID, Desc, progress, updated_time) AS (
    VALUES 
(1, 'abcd', 'planned',  timestamp '2022-04-20 10:00'),
(1, 'abcd', 'planned',  timestamp '2022-04-25 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-04-26 16:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-06 12:00'),
(1, 'abcd', 'in progress',  timestamp '2022-05-07 12:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-04 11:00'),
(2, 'abcd', 'in progress',  timestamp '2022-05-06 12:00')
) 

--query
select  id, Desc, progress, updated_time
from dataset o
where updated_time = (select max(updated_time) from dataset i where i.id = o.id)

或使用 max window 函数和 subselect 的类似方法：

--query
select  id, Desc, progress, updated_time
from (
    select *,  max(updated_time) over (partition by id) max_time
    from dataset
)
where max_time = updated_time

或者只使用 row_number:

select  id, Desc, progress, updated_time
from 
(
    select *,  
        row_number() over(partition by id order by updated_time desc) rank
    from dataset
)
where rank  = 1

输出：

id	Desc	progress	updated_time
1	abcd	in progress	2022-05-07 12:00:00.000
2	abcd	in progress	2022-05-06 12:00:00.000

在 Presto SQL 的 where 子句中使用 Max()

Using Max() in where clause in Presto SQL

sql

presto