sql 中的二维比较

Two dimensional comparison in sql

数据库架构

CREATE TABLE newsletter_status
(
    cryptid varchar(255) NOT NULL,
    status varchar(25),
    regDat timestamp,
    confirmDat timestamp,
    updateDat timestamp,
    deleteDat timestamp
);

有相同的行 cryptid,我需要将它们压缩到一行。所以 cryptid 变得非常独特。复杂性来自于我需要按行和按列比较日期这一事实。如何实现?

我需要使用的规则是:

示例:

002bc5 | new         | 2010.01.15 | 2001.01.15 | NULL       | 2020.01.10
002bc5 | confirmed   | NULL       | 2020.01.30 | 2020.01.15 | 2020.01.15
002bc5 | deactivated | NULL       | NULL       | NULL       | 2020.12.03

需要压缩成:

002bc5 | deactivated | 2010.01.15 | 2020.01.30 | 2020.01.15 | 2020.12.03

状态deactivated被取因为时间戳2020.12.03是最新的

您可以使用聚合:

select cryptid,
       coalesce(max(case when status = 'deactivated' then status end)
                max(case when status = 'confirmed' then status end),
                max(case when status = 'new' then status end),
               ) as status,
       max(regDat),
       max(confirmDat),
       max(updateDat),
       max(deleteDat)
from newsletter_status
group by cryptid;

coalesce() 是按优先顺序获取状态的技巧。

编辑:

如果您只想要具有最新时间戳的行:

select cryptid,
       max(case when seqnum = 1 then status end) as status_on_max_date,
       max(regDat),
       max(confirmDat),
       max(updateDat),
       max(deleteDat)
from (select ns.*,
             row_number() over (partition by cryptid
                                order by greatest(coalesce(regDat, '2000-01-01'),
                                              coalesce(confirmDat, '2000-01-01'),
                                               coalesce(updateDat, '2000-01-01'),
                                               coalesce(deleteDat, '2000-01-01')
                                         )
                               ) as seqnum
                      
      from newsletter_status ns
     ) ns
group by cryptid;

我首先按日期列的最大值对每个 cryptid 的行进行排名。然后我们可以使用该信息来确定每个 cryptid 的最新状态,并聚合:

select cryptid, 
    max(case when rn = 1 then status end) as status,
    max(regDate) as regDat,
    max(confirmDat) as confirmDat,
    max(updatedDat) as updatedDat,
    max(deleteDat) as deleteDat
from (
    select ns.*, 
        row_number() over(
            partition by cryptid 
            order by greatest(
                coalesce(regDate,    '0001-01-01'), 
                coalesce(confirmDat, '0001-01-01'),
                coalesce(updatedDat, '0001-01-01'), 
                coalesce(deleteDat,  '0001-01-01')
            )
        ) rn 
    from newsletter_status ns
) ns
group by cryptid

获取状态所需的是按日期降序对行集进行排序。在Oracle中有agg_func(<arg>) keep (dense_rank first ...),在其他数据库中可以用row_number()代替并过滤。由于 HANA 中的分析函数有时效果不佳,我建议使用我在 HANA 中知道的唯一一个支持内部排序的聚合函数 - STRING_AGG - 只需一点技巧。如果您没有数千行的状态(即 varchar 的串联状态不会大于 4000),它将起作用。这是查询:

select
  cryptid,
  max(regDat) as regDat,
  max(confirmDat) as confirmDat,
  max(updateDat) as updateDat,
  max(deleteDat) as deleteDat,
  substr_before(
    string_agg(status, '|'
      order by greatest(
        ifnull(regDat, date '1000-01-01'),
        ifnull(confirmDat, date '1000-01-01'),
        ifnull(updateDat, date '1000-01-01'),
        ifnull(deleteDat, date '1000-01-01')
      ) desc),
    '|'
  ) as status
from newsletter_status
group by cryptid