尝试使用 group by,rank 和 dense rank 的混合来聚合数据但没有运气
Trying to aggregate data using a mixture of group by, rank and dense rank with no luck
我正在与一些非常可怕的遗留数据集作斗争,需要聚合数据以使其更有用。我不太确定我是否需要排名,dense_rank 或分组依据或 3.(或新的东西)的组合。
数据结构如下:
--[Table:]
hashed_id | visit_id | datetime | page_name | ...
----------+----------+---------------------+-----------+-----
abc | 1 | 2019-01-01 00:00:01 | page1 | ...
abc | 1 | 2019-01-01 00:00:02 | page1 | ...
abc | 1 | 2019-01-01 00:00:03 | page1 | ...
abc | 1 | 2019-01-01 00:00:10 | page1 | ...
abc | 1 | 2019-01-01 00:00:20 | page2 | ...
abc | 1 | 2019-01-01 00:00:32 | page2 | ...
abc | 1 | 2019-01-01 00:00:53 | page1 | ...
abc | 1 | 2019-01-01 00:00:54 | page1 | ...
我想要
--[Table:]
hashed_id | visit_id | datetime | page_name | ...
----------+----------+---------------------+-----------+-----
abc | 1 | 2019-01-01 00:00:01 | page1 | ...
abc | 1 | 2019-01-01 00:00:20 | page2 | ...
abc | 1 | 2019-01-01 00:00:53 | page1 | ...
我试过使用排名、密集排名和分组依据,但似乎没有得到想要的结果。我是白痴吗:)?
select 您的数据似乎需要在 table 和 min(datetime) 组之间加入 hashed_id、visit_id
select * from my_table m
inner join (
select hashed_id, visit_id, min(datetime) min_date
from my_table
group by hashed_id, visit_id
) t 0n t.hashed_id = m.hashed_id
and t.visit_id = m.visit_id
and t.min_date = m.datetime
使用lag()
获取与前一页不同的页面首次出现的时间:
select t.*
from (select t.*,
lag(page_name) over (partition by hashed_id, visit_id order by datetime) as prev_page_name
from t
) t
where prev_page_name is null or prev_page_name <> page_name
我正在与一些非常可怕的遗留数据集作斗争,需要聚合数据以使其更有用。我不太确定我是否需要排名,dense_rank 或分组依据或 3.(或新的东西)的组合。
数据结构如下:
--[Table:]
hashed_id | visit_id | datetime | page_name | ...
----------+----------+---------------------+-----------+-----
abc | 1 | 2019-01-01 00:00:01 | page1 | ...
abc | 1 | 2019-01-01 00:00:02 | page1 | ...
abc | 1 | 2019-01-01 00:00:03 | page1 | ...
abc | 1 | 2019-01-01 00:00:10 | page1 | ...
abc | 1 | 2019-01-01 00:00:20 | page2 | ...
abc | 1 | 2019-01-01 00:00:32 | page2 | ...
abc | 1 | 2019-01-01 00:00:53 | page1 | ...
abc | 1 | 2019-01-01 00:00:54 | page1 | ...
我想要
--[Table:]
hashed_id | visit_id | datetime | page_name | ...
----------+----------+---------------------+-----------+-----
abc | 1 | 2019-01-01 00:00:01 | page1 | ...
abc | 1 | 2019-01-01 00:00:20 | page2 | ...
abc | 1 | 2019-01-01 00:00:53 | page1 | ...
我试过使用排名、密集排名和分组依据,但似乎没有得到想要的结果。我是白痴吗:)?
select 您的数据似乎需要在 table 和 min(datetime) 组之间加入 hashed_id、visit_id
select * from my_table m
inner join (
select hashed_id, visit_id, min(datetime) min_date
from my_table
group by hashed_id, visit_id
) t 0n t.hashed_id = m.hashed_id
and t.visit_id = m.visit_id
and t.min_date = m.datetime
使用lag()
获取与前一页不同的页面首次出现的时间:
select t.*
from (select t.*,
lag(page_name) over (partition by hashed_id, visit_id order by datetime) as prev_page_name
from t
) t
where prev_page_name is null or prev_page_name <> page_name