尝试使用 group by,rank 和 dense rank 的混合来聚合数据但没有运气

Trying to aggregate data using a mixture of group by, rank and dense rank with no luck

我正在与一些非常可怕的遗留数据集作斗争,需要聚合数据以使其更有用。我不太确定我是否需要排名,dense_rank 或分组依据或 3.(或新的东西)的组合。

数据结构如下:

--[Table:]
hashed_id | visit_id | datetime            | page_name | ...
----------+----------+---------------------+-----------+-----
abc       | 1        | 2019-01-01 00:00:01 | page1     | ...
abc       | 1        | 2019-01-01 00:00:02 | page1     | ...
abc       | 1        | 2019-01-01 00:00:03 | page1     | ...
abc       | 1        | 2019-01-01 00:00:10 | page1     | ...
abc       | 1        | 2019-01-01 00:00:20 | page2     | ...
abc       | 1        | 2019-01-01 00:00:32 | page2     | ...
abc       | 1        | 2019-01-01 00:00:53 | page1     | ...
abc       | 1        | 2019-01-01 00:00:54 | page1     | ...

我想要

--[Table:]
hashed_id | visit_id | datetime            | page_name | ...
----------+----------+---------------------+-----------+-----
abc       | 1        | 2019-01-01 00:00:01 | page1     | ...
abc       | 1        | 2019-01-01 00:00:20 | page2     | ...
abc       | 1        | 2019-01-01 00:00:53 | page1     | ... 

我试过使用排名、密集排名和分组依据,但似乎没有得到想要的结果。我是白痴吗:)?

select 您的数据似乎需要在 table 和 min(datetime) 组之间加入 hashed_id、visit_id

select * from my_table  m 
inner join  (
  select  hashed_id, visit_id, min(datetime) min_date 
  from my_table
  group by   hashed_id, visit_id
 ) t 0n t.hashed_id = m.hashed_id 
    and t.visit_id = m.visit_id 
        and t.min_date = m.datetime

使用lag()获取与前一页不同的页面首次出现的时间:

select t.*
from (select t.*,
             lag(page_name) over (partition by hashed_id, visit_id order by datetime) as prev_page_name
      from t
     ) t
where prev_page_name is null or prev_page_name <> page_name