滞后函数获取最后一个不同的值(红移)

lag function to get the last different value(redshift)

我有一个示例数据如下,想要得到想要的 o/p,请帮我出出主意。

我希望第 3、4 行 prev_diff_value 的 o/p 为 2015-01-01 00:00:00 而不是 2015-01-02 00:00:00.

with dat as (
            select 1 as id,'20150101 02:02:50'::timestamp as dt union all
            select 1,'20150101 03:02:50'::timestamp union all
            select 1,'20150101 04:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:50'::timestamp union all
            select 1,'20150102 02:02:51'::timestamp union all
            select 1,'20150103 02:02:50'::timestamp union all
            select 2,'20150101 02:02:50'::timestamp union all
            select 2,'20150101 03:02:50'::timestamp union all
            select 2,'20150101 04:02:50'::timestamp union all
            select 2,'20150102 02:02:50'::timestamp union all
            select 1,'20150104 02:02:50'::timestamp
            )-- select * from dat
   select id , dt , lag(trunc(dt)) over(partition by id order by dt asc) prev_diff_value
   from dat
  order by id,dt desc
O/P : 
   id   dt                    prev_diff_value
   1    2015-01-04 02:02:50   2015-01-03 00:00:00
   1    2015-01-03 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:51   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-02 00:00:00
   1    2015-01-02 02:02:50   2015-01-01 00:00:00

据我了解,您想获取 id 分区内每个时间戳的前一个不同日期。然后,我将针对 iddate 的独特组合应用 lag,并像这样连接回原始数据集:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
,dat_unique_lag as (
    select *, lag(date) over(partition by id order by date asc) prev_diff_value
    from (
        select distinct id,trunc(dt) as date
        from dat
    )
)
select *
from dat
join dat_unique_lag
using (id)
where trunc(dat.dt)=dat_unique_lag.date
order by id,dt desc;

然而,这并不是超级性能。如果你的数据的性质是同一天的时间戳数量有限,你可能只是用这样的条件语句来延长你的延迟:

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select id, dt,
case 
    when lag(trunc(dt)) over(partition by id order by dt asc)=trunc(dt)
    then case 
        when lag(trunc(dt),2) over(partition by id order by dt asc)=trunc(dt)
        then case
            when lag(trunc(dt),3) over(partition by id order by dt asc)=trunc(dt)
            then lag(trunc(dt),4) over(partition by id order by dt asc)
            else lag(trunc(dt),3) over(partition by id order by dt asc)
            end
        else lag(trunc(dt),2) over(partition by id order by dt asc)
        end
    else lag(trunc(dt)) over(partition by id order by dt asc)
end as prev_diff_value
from dat
order by id,dt desc;

基本上,您会查看之前的记录,如果它不适合您,那么您会回顾之前的记录,依此类推,直到找到正确的记录或从您的陈述中找到 运行深度。在这里查找直到返回第 4 条记录。

这是看待问题的另一种方式,虽然效率不高,但还是挺有趣的。

with dat as (
    select 1 as id,'20150101 02:02:50'::timestamp as dt union all
    select 1,'20150101 03:02:50'::timestamp union all
    select 1,'20150101 04:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:50'::timestamp union all
    select 1,'20150102 02:02:51'::timestamp union all
    select 1,'20150103 02:02:50'::timestamp union all
    select 2,'20150101 02:02:50'::timestamp union all
    select 2,'20150101 03:02:50'::timestamp union all
    select 2,'20150101 04:02:50'::timestamp union all
    select 2,'20150102 02:02:50'::timestamp union all
    select 1,'20150104 02:02:50'::timestamp
)
select distinct
dat.id
,dat.dt
,last_value(dat2.d) over (partition by dat.id, dat.dt order by dat2.d asc rows between unbounded preceding and unbounded following) as prev_diff_value
from dat
left join (
    select distinct
    id
    ,trunc(dt) as d
    from dat) dat2 on dat.id = dat2.id and trunc(dat.dt) > dat2.d
order by 1,2,3;

这将提取不同的 id 和日期对,并仅在连接日期早于相关行的情况下将它们重新连接到数据集。然后,last_value 函数将获取每行的最后一个值,并且 distinct 从输出中删除所有不相关的行。我知道这个问题已经有几年了-但我偶然发现了它并且玩得很开心。