Redshift：与 DISTINCT 子句一起使用时排序顺序中断

Question

我有如下数据：

select study_id , updated_by ,created_at 
from my_table ps 
where study_id = '1';

我想根据 created_at 对记录进行降序排序，并选择不同的 study_id 和 updated_by.

我在下面试过：

我运行在 Redshift 遇到了一个奇怪的问题。请考虑以下查询：

select study_id , updated_by 
from my_table ps 
where study_id = '1' 
ORDER BY created_at DESC ;

这导致：

但我只需要选择 Distinct 条记录。所以，我使用了这个查询：

select DISTINCT study_id , updated_by 
from my_table ps
where study_id = '1' 
ORDER BY created_at DESC ;

这导致：

如您所见，maya2 的记录现在显示为最新，而不是 maya1。

为什么 DISTINCT 排序中断？我该如何解决这个问题？

Answer 1

您的排序没有中断。正如 Zaynul 指出的那样，您在 created_at DESC 上进行排序（并且我们在您的示例中看不到该数据）所以 <1, maya2> 行中的任何值都大于 created_at 中的值其他行。

更新数据后：您有两行 maya1，其中一行 created_at < maya2 行时间戳，另一行 created_at 时间戳 > maya2 行时间戳。

DISTINCT 操作select编辑了时间戳 > maya2 行的行。 DISTINCT 操作将非确定性地 select 具有相同兴趣键的集合中的一行 ().

Answer 2

所以在您的数据集中只有 maya1 被复制，maya1 持有时间戳 7:31 和 7:29 但 maya2 持有 7:30 ，因此当您使用不同的查询引擎时，删除一个持有 7:31 的 maya1，结果 maya2 获得最高位置

如果你需要最新的，那么只需使用 max

select study_id , updated_by ,max(created_at) as created_at
from my_table ps
where study_id = '1' 
group by study_id , updated_by 
ORDER BY created_at DESC

如果你只需要 study_id , updated_by 然后使用 row_number()

select  select study_id , updated_by from
( select study_id , updated_by ,
   row_number() over(partition by study_id , updated_by     ORDER BY created_at DESC ) rn
    from my_table ps
    where study_id = '1' 
 ) a where a.rn=1

Answer 3

令人惊讶的是声明

SELECT DISTINCT study_id , updated_by 
FROM my_table ps
WHERE study_id = '1' 
ORDER BY created_at DESC ;

完全有效。根据数据集中不存在的属性对数据集进行排序是没有意义的。

如果您正在尝试实现与 PostgreSQL 的 DISTINCT ON 等效的功能，那么 Redshift 可能没有它，因此您必须使用子查询以不同的方式进行操作：

WITH t AS (
   SELECT study_id, updated_by
        , max(created_at) created_at -- Or min(created_at) - whatever you need
   FROM my_table ps
   WHERE study_id = '1' 
   GROUP BY study_id, updated_by
)
SELECT study_id, pudated_by
FROM t
ORDER BY created_at DESC

Redshift：与 DISTINCT 子句一起使用时排序顺序中断

Redshift: Sorting order breaks when used with DISTINCT clause

sql

sorting

distinct

amazon-redshift