根据多列和日期时间删除重复项

Remove duplicates based on multiple columns and datetime

我想根据较早的日期时间删除具有相同 visitor_id 的重复行。例如,对于 visitor_id 2643331144,我想选择第 1 行,因为它具有较早的访问日期时间,并且还要将频道和 visit_page 保留在同一行。对于 visitor_id 1092581226,我想保留第 3 行。

rowno visitor_id datetime channel visit_page
1 2643331144 10/3/2021 4:05:29 PM email landing page
2 2643331144 10/3/2021 4:05:39 PM organic search landing page
3 1092581226 10/7/2021 1:08:12 PM email price reduced
4 1092581226 10/7/2021 1:08:44 PM organic search landing page
5 1092581226 10/7/2021 1:09:04 PM paid search unknow
6 1092581226 10/7/2021 1:09:05 PM email price reduced

我想要如下所示的结果:

rowno visitor_id datetime channel visit_page
1 2643331144 10/3/2021 4:05:29 PM email landing page
2 1092581226 10/7/2021 1:08:12 PM email price reduced

我使用了下面的查询,但访问者总数被过度删除了。但如果不使用分区,总数将被重复计算,因为同一访问者在同一会话期间有多个频道和页面。

with T as
(select *, row_number() over (partition by visitor_id order by datetime asc) as rank
from table A)

select distinct visitor_id, channel, visit_page
from T
where rank=1

如果唯一的问题是最终输出中的 rownum,您可以在最终 select:

中使用 row_number() over (order by datetime asc) as rownum“重新计算”它
with cte (
   visitor_id 
  ,datetime   
  ,channel    
  ,visit_page 
) as (
    values 
     (2643331144,'10/3/2021 4:05:29 PM','email','landing page'),
 (2643331144,'10/3/2021 4:05:39 PM','organic search','landing page'),
 (1092581226,'10/7/2021 1:08:12 PM','email','price reduced'),
 (1092581226,'10/7/2021 1:08:44 PM','organic search','landing page'),
 (1092581226,'10/7/2021 1:09:04 PM','paid search','unknow'),
 (1092581226,'10/7/2021 1:09:05 PM','email','price reduced')
)

select row_number() over (order by datetime asc) as rownum,
    visitor_id,
    datetime,
    channel,
    visit_page
from (
        -- inlined your WITH clause into subquery
        select *,
            row_number() over (
                partition by visitor_id
                order by datetime asc
            ) as rank
        from cte
    )
where rank = 1

输出:

rownum visitor_id datetime channel visit_page
1 2643331144 10/3/2021 4:05:29 PM email landing page
2 1092581226 10/7/2021 1:08:12 PM email price reduced