有没有办法在 Redshift Spectrum 中使用 "IN" 条件检查多个列？

Question

我有一个名为 customer_details_table 的 Redshift Spectrum table，其中列 id 不是唯一的。我还有另一列 hierarchy，它基于如果记录具有相同的 ID 应优先考虑的记录。这是一个例子：

这里，如果我们多次遇到与28846相同的id，我们会选择John作为合格者，考虑到他的层级最大。

我正在尝试使用 id 上的 group by 创建此 eligibility 列，然后选择对应于最大值 hierarchy 的记录。这是我的 SQL 代码：

SELECT *,
CASE WHEN (
     (id , hierarchy) IN 
            (SELECT id , max(hierarchy)
            FROM
              customer_details_table
            GROUP BY id
            )
) THEN 'Qualified' ELSE 'Disqualified' END as eligibility
FROM
  customer_details_table

在运行之后，我收到以下错误：

SQL Error [500310] [XX000]: [Amazon](500310) Invalid operation: This type of IN/NOT IN query is not supported yet;

当我的 table (customer_details_table) 是一个常规的 Redshift table 时，上面的代码工作正常，但是当相同的 table 是一个外部光谱 table。任何人都可以提出一个好的 solution/alternative 来在频谱 tables 中实现相同的逻辑吗？

Answer 1

您可以使用 window 函数：

select cdt.*
from (select cdt.*,
             row_number() over (partition by id order by hierarchy desc) as seqnum
      from customer_details_table cdt
     ) cdt
where seqnum = 1;

Answer 2

您可以使用 window 函数生成 eligibility 列：

基本上您需要按 id 对行进行分区，并在每个组中按 hierarchy 降序排列。

select
    *,
    case when row_number() over(partition by id order by hierarchy desc) = 1
        then 'Qualified' else 'Disqualified'
    end eligibility
from customer_details_table

有没有办法在 Redshift Spectrum 中使用 "IN" 条件检查多个列？

Is there a way to check multiple columns using "IN" condition in Redshift Spectrum?

sql

greatest-n-per-group

window-functions

amazon-redshift

amazon-redshift-spectrum