SQL - 如何 select x 特定行之前的行数
SQL - How to select x number of rows prior to a specific row
我有这个table:
ts | user_id | event |
-------------------------------
1500 a eat
1501 a walk
1502 a sleep
1500 b eat
1501 b sleep
1502 b wake
1500 c walk
1501 c eat
1502 c sit
1503 c sleep
1504 c wake
所以我想 select x
某个事件之前的行数,假设我想 select 每个 [=24] sleep
之前的 2 个事件=].
我的最终 table 结果应该如下所示:
user_id | event | rank |
--------------------------------
a eat 1
a walk 2
a sleep 3
b NULL 0
b eat 1
b sleep 2
c eat 2
c sit 3
c sleep 4
如何在 SQL 中执行此操作(特别是 Redshift SQl)
这是一个 gaps-and-islands 问题,您需要每个岛的第一行和最后两行。
可能最安全的方法是 window 睡眠事件的总和来定义组,然后使用 row_number()
:
进行过滤
select *
from (
select t.*,
row_number() over(partition by user_id, grp order by ts) rn_asc,
row_number() over(partition by user_id, grp order by ts desc) rn_desc
from (
select t.*,
sum(case when event = 'sleep' then 1 else 0 end)
over(partition by user_id order by ts desc) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id, ts
我们用 window 降序排列的“睡眠”事件来定义岛屿。然后,我们就按升序和降序枚举每个岛的行,过滤我们感兴趣的记录。
ts | user_id | event | grp | rn_asc | rn_desc
---: | :------ | :---- | --: | -----: | ------:
1500 | a | eat | 1 | 1 | 3
1501 | a | walk | 1 | 2 | 2
1502 | a | sleep | 1 | 3 | 1
1500 | b | eat | 1 | 1 | 2
1501 | b | sleep | 1 | 2 | 1
1500 | c | walk | 1 | 1 | 4
1502 | c | sit | 1 | 3 | 2
1503 | c | sleep | 1 | 4 | 1
编辑
Redshift 在 window 函数的 order by
子句中需要一个 window 框架。所以输入的时间有点长:
select *
from (
select t.*,
row_number() over(
partition by user_id, grp
order by ts rows between unbounded preceding and current row
) rn_asc,
row_number() over(
partition by user_id, grp
order by ts rows between unbounded preceding and current row
) rn_desc
from (
select t.*,
sum(case when event = 'sleep' then 1 else 0 end) over(
partition by user_id
order by ts desc
order by ts rows between unbounded preceding and current row
) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id, ts
嗯。 . .您可以使用 lead()
:
select t.*
from (select t.*,
lead(event) over (partition by user_id order by ts) as next_event,
lead(event, 2) over (partition by user_id order by ts) as next_event2
from t
) t
where 'sleep' in (event, next_event, next_event2);
注意:这只有 returns 行数据。如果您需要生成行,则需要额外的逻辑。
编辑:
你实际上可以概括这个:
select t.*
from (select t.*,
sum(case when event = 'sleep') over (partition by user_id order by ts rows between current row and 2 following) as cnt_sleep
from t
) t
where cnt_sleep > 0;
这会计算接下来 n
行中“睡眠”的次数(好吧,n - 1)。如果在其中任何一个中发现“睡眠”,它 returns 一行。
我有这个table:
ts | user_id | event |
-------------------------------
1500 a eat
1501 a walk
1502 a sleep
1500 b eat
1501 b sleep
1502 b wake
1500 c walk
1501 c eat
1502 c sit
1503 c sleep
1504 c wake
所以我想 select x
某个事件之前的行数,假设我想 select 每个 [=24] sleep
之前的 2 个事件=].
我的最终 table 结果应该如下所示:
user_id | event | rank |
--------------------------------
a eat 1
a walk 2
a sleep 3
b NULL 0
b eat 1
b sleep 2
c eat 2
c sit 3
c sleep 4
如何在 SQL 中执行此操作(特别是 Redshift SQl)
这是一个 gaps-and-islands 问题,您需要每个岛的第一行和最后两行。
可能最安全的方法是 window 睡眠事件的总和来定义组,然后使用 row_number()
:
select *
from (
select t.*,
row_number() over(partition by user_id, grp order by ts) rn_asc,
row_number() over(partition by user_id, grp order by ts desc) rn_desc
from (
select t.*,
sum(case when event = 'sleep' then 1 else 0 end)
over(partition by user_id order by ts desc) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id, ts
我们用 window 降序排列的“睡眠”事件来定义岛屿。然后,我们就按升序和降序枚举每个岛的行,过滤我们感兴趣的记录。
ts | user_id | event | grp | rn_asc | rn_desc ---: | :------ | :---- | --: | -----: | ------: 1500 | a | eat | 1 | 1 | 3 1501 | a | walk | 1 | 2 | 2 1502 | a | sleep | 1 | 3 | 1 1500 | b | eat | 1 | 1 | 2 1501 | b | sleep | 1 | 2 | 1 1500 | c | walk | 1 | 1 | 4 1502 | c | sit | 1 | 3 | 2 1503 | c | sleep | 1 | 4 | 1
编辑
Redshift 在 window 函数的 order by
子句中需要一个 window 框架。所以输入的时间有点长:
select *
from (
select t.*,
row_number() over(
partition by user_id, grp
order by ts rows between unbounded preceding and current row
) rn_asc,
row_number() over(
partition by user_id, grp
order by ts rows between unbounded preceding and current row
) rn_desc
from (
select t.*,
sum(case when event = 'sleep' then 1 else 0 end) over(
partition by user_id
order by ts desc
order by ts rows between unbounded preceding and current row
) grp
from mytable t
) t
) t
where (rn_asc = 1 or rn_desc <= 2) and grp > 0
order by user_id, ts
嗯。 . .您可以使用 lead()
:
select t.*
from (select t.*,
lead(event) over (partition by user_id order by ts) as next_event,
lead(event, 2) over (partition by user_id order by ts) as next_event2
from t
) t
where 'sleep' in (event, next_event, next_event2);
注意:这只有 returns 行数据。如果您需要生成行,则需要额外的逻辑。
编辑:
你实际上可以概括这个:
select t.*
from (select t.*,
sum(case when event = 'sleep') over (partition by user_id order by ts rows between current row and 2 following) as cnt_sleep
from t
) t
where cnt_sleep > 0;
这会计算接下来 n
行中“睡眠”的次数(好吧,n - 1)。如果在其中任何一个中发现“睡眠”,它 returns 一行。