如何使用带条件的 SQL Lag() 函数检索行
How to retrieve rows using SQL Lag() function with condition
我一直在尝试解决这个问题,但没有得到想要的结果。请帮忙,因为我已经尝试了好几天了。
下面有 table,其中 click_time 和 fetches_before_click 列是我需要的结果。
计算 unix_time_seconds 列中“点击”动作与点击之前的“提取”动作之间的时间差,标志 = 2。
计算“click”和之前“fetch”满足上述条件的“flag = 2”之间的行数。 fetches_before_click 列显示了在“点击”事件之前用户进行了多少次“提取”
click_time 列仅显示“点击”行与点击前的第一行之间的区别,该行具有标志 = 2 和事件=fetch
我使用了以下语句,但我不确定如何倒退并找到第一行“fetch”且值为“2”,然后取出该行并从点击中减去。
我使用 lag() 函数来读取前几行,但它在到达前一行时遇到了问题,该行具有 flag=2 和 event = fetch
select time, unix_time_seconds,event,flag,
case when event = "click"
then unix_time_seconds - lag(unix_time_seconds, 1 ) over (order by flag desc)
end as click_time
from table_event
time
unix_time_seconds
event
flag
click_time
fetches_before_click
1/2/22 3:52
1641095536
fetch
2
1/2/22 3:52
1641095539
click
0
3
1
1/2/22 4:59
1641099553
fetch
2
1/2/22 4:59
1641099561
fetch
1
1/2/22 4:59
1641099568
fetch
1
1/2/22 4:59
1641099575
fetch
1
1/2/22 6:51
1641106302
fetch
2
1/2/22 6:51
1641106317
fetch
1
1/2/22 6:51
1641106319
click
0
17
2
1/3/22 6:15
1641190520
fetch
2
1/7/22 8:12
1641543135
fetch
2
1/10/22 1:09
1641776996
fetch
2
1/10/22 1:09
1641776997
click
0
1
1
1/10/22 1:12
1641777179
fetch
2
1/10/22 1:13
1641777181
click
0
2
1
只需在 window 函数中添加 partition by (case when event='click' or flag=2 then 1 end)
即可达到目的。此条件将忽略事件不是 'click' 或事件是 'fetch' 且标志不是 2.
的所有行
而不是所有十五行 lag()
window 函数将只考虑以下 11 行。
time
unix_time_seconds
event
flag
2022-02-01 03:52:16
1641095536
fetch
2
2022-02-01 03:52:19
1641095539
click
0
2022-02-01 04:59:13
1641099553
fetch
2
2022-02-01 06:51:42
1641106302
fetch
2
2022-02-01 06:51:59
1641106319
click
0
2022-03-01 06:15:20
1641190520
fetch
2
2022-07-01 08:12:15
1641543135
fetch
2
2022-10-01 01:09:56
1641776996
fetch
2
2022-10-01 01:09:57
1641776997
click
0
2022-10-01 01:12:59
1641777179
fetch
2
0122-10-01 01:13:01
1641777181
click
0
架构和插入语句:
create table table_event(time datetime, unix_time_seconds int, event varchar(10), flag int);
insert into table_event values('22/2/1 3:52:16', 1641095536, 'fetch', 2);
insert into table_event values('22/2/1 3:52:19', 1641095539, 'click', 0);
insert into table_event values('22/2/1 4:59:13', 1641099553, 'fetch', 2);
insert into table_event values('22/2/1 4:59:21', 1641099561, 'fetch', 1);
insert into table_event values('22/2/1 4:59:28', 1641099568, 'fetch', 1);
insert into table_event values('22/2/1 4:59:35', 1641099575, 'fetch', 1);
insert into table_event values('22/2/1 6:51:42', 1641106302, 'fetch', 2);
insert into table_event values('22/2/1 6:51:57', 1641106317, 'fetch', 1);
insert into table_event values('22/2/1 6:51:59', 1641106319, 'click', 0);
insert into table_event values('22/3/1 6:15:20', 1641190520, 'fetch', 2);
insert into table_event values('22/7/1 8:12:15', 1641543135, 'fetch', 2);
insert into table_event values('22/10/1 1:09:56', 1641776996, 'fetch', 2);
insert into table_event values('22/10/1 1:09:57', 1641776997, 'click', 0);
insert into table_event values('22/10/1 1:12:59', 1641777179, 'fetch', 2);
insert into table_event values('122/10/1 1:13:01', 1641777181, 'click', 0);
查询:
select 时间,unix_time_seconds,事件,标志,
事件 = 'click' 的情况
然后 unix_time_seconds - lag(unix_time_seconds, 1 ) over (partition by (case when event='click' or(event='fetch' and flag=2) then 1 end) order按时间 )
以 click_time
结尾
来自 table_event
按 unix_time_seconds
排序
输出:
time
unix_time_seconds
event
flag
click_time
2022-02-01 03:52:16
1641095536
fetch
2
null
2022-02-01 03:52:19
1641095539
click
0
3
2022-02-01 04:59:13
1641099553
fetch
2
null
2022-02-01 04:59:21
1641099561
fetch
1
null
2022-02-01 04:59:28
1641099568
fetch
1
null
2022-02-01 04:59:35
1641099575
fetch
1
null
2022-02-01 06:51:42
1641106302
fetch
2
null
2022-02-01 06:51:57
1641106317
fetch
1
null
2022-02-01 06:51:59
1641106319
click
0
17
2022-03-01 06:15:20
1641190520
fetch
2
null
2022-07-01 08:12:15
1641543135
fetch
2
null
2022-10-01 01:09:56
1641776996
fetch
2
null
2022-10-01 01:09:57
1641776997
click
0
1
2022-10-01 01:12:59
1641777179
fetch
2
null
22-10-01 01:13:01
1641777181
click
0
2
db<>fiddle here
这应该可以解决问题:
select time, unix_time_seconds,event,flag,
case when event = 'click' then
unix_time_seconds - (select top 1 unix_time_seconds from table_event y where event = 'fetch' and flag = 2 AND y.unix_time_seconds <= z.unix_time_seconds order by y.unix_time_seconds desc)
end as click_time from table_event z order by unix_time_seconds
这个问题有多种解法,这里是其中一种。
请注意使用 values 创建临时数据集。
另一种方法是使用 stack 函数
还要注意 timestamp literal 的使用,例如timestamp '2022-01-02 03:52:16'
外部 CASE 语句是为了显示 click_time 仅针对点击事件的值。
window 函数按 unix_time_seconds 对记录进行排序,每条记录取最大值 unix_time_seconds 直到这条记录(order by
在这种情况下实际上是order by ... rows between unbounded preceding and current row
).
的隐含语法
window 函数中的 CASE 语句确保我们只查看带有 2 个标志的获取事件。
with t (time, unix_time_seconds, event, flag)
as
(
select *
from values (timestamp '2022-01-02 03:52:16', 1641095536, 'fetch', 2)
,(timestamp '2022-01-02 03:52:19', 1641095539, 'click', 0)
,(timestamp '2022-01-02 04:59:13', 1641099553, 'fetch', 2)
,(timestamp '2022-01-02 04:59:21', 1641099561, 'fetch', 1)
,(timestamp '2022-01-02 04:59:28', 1641099568, 'fetch', 1)
,(timestamp '2022-01-02 04:59:35', 1641099575, 'fetch', 1)
,(timestamp '2022-01-02 06:51:42', 1641106302, 'fetch', 2)
,(timestamp '2022-01-02 06:51:57', 1641106317, 'fetch', 1)
,(timestamp '2022-01-02 06:51:59', 1641106319, 'click', 0)
,(timestamp '2022-01-03 06:15:20', 1641190520, 'fetch', 2)
,(timestamp '2022-01-07 08:12:15', 1641543135, 'fetch', 2)
,(timestamp '2022-01-10 01:09:56', 1641776996, 'fetch', 2)
,(timestamp '2022-01-10 01:09:57', 1641776997, 'click', 0)
,(timestamp '2022-01-10 01:12:59', 1641777179, 'fetch', 2)
,(timestamp '2022-01-10 01:13:01', 1641777181, 'click', 0)
)
select *
,case
when event == 'click'
then unix_time_seconds
- max(case
when event = 'fetch' and flag = 2
then unix_time_seconds
end
) over (order by unix_time_seconds)
end as click_time
from t
time
unix_time_seconds
event
flag
click_time
2022-01-02T03:52:16.000+0000
1641095536
fetch
2
null
2022-01-02T03:52:19.000+0000
1641095539
click
0
3
2022-01-02T04:59:13.000+0000
1641099553
fetch
2
null
2022-01-02T04:59:21.000+0000
1641099561
fetch
1
null
2022-01-02T04:59:28.000+0000
1641099568
fetch
1
null
2022-01-02T04:59:35.000+0000
1641099575
fetch
1
null
2022-01-02T06:51:42.000+0000
1641106302
fetch
2
null
2022-01-02T06:51:57.000+0000
1641106317
fetch
1
null
2022-01-02T06:51:59.000+0000
1641106319
click
0
17
2022-01-03T06:15:20.000+0000
1641190520
fetch
2
null
2022-01-07T08:12:15.000+0000
1641543135
fetch
2
null
2022-01-10T01:09:56.000+0000
1641776996
fetch
2
null
2022-01-10T01:09:57.000+0000
1641776997
click
0
1
2022-01-10T01:12:59.000+0000
1641777179
fetch
2
null
2022-01-10T01:13:01.000+0000
1641777181
click
0
2
此解决方案已在 Azure Databricks、RT 10.1 和 Apache Spark 3.2.0
上进行了测试
我一直在尝试解决这个问题,但没有得到想要的结果。请帮忙,因为我已经尝试了好几天了。
下面有 table,其中 click_time 和 fetches_before_click 列是我需要的结果。
计算 unix_time_seconds 列中“点击”动作与点击之前的“提取”动作之间的时间差,标志 = 2。
计算“click”和之前“fetch”满足上述条件的“flag = 2”之间的行数。 fetches_before_click 列显示了在“点击”事件之前用户进行了多少次“提取”
click_time 列仅显示“点击”行与点击前的第一行之间的区别,该行具有标志 = 2 和事件=fetch
我使用了以下语句,但我不确定如何倒退并找到第一行“fetch”且值为“2”,然后取出该行并从点击中减去。 我使用 lag() 函数来读取前几行,但它在到达前一行时遇到了问题,该行具有 flag=2 和 event = fetch
select time, unix_time_seconds,event,flag,
case when event = "click"
then unix_time_seconds - lag(unix_time_seconds, 1 ) over (order by flag desc)
end as click_time
from table_event
time | unix_time_seconds | event | flag | click_time | fetches_before_click |
---|---|---|---|---|---|
1/2/22 3:52 | 1641095536 | fetch | 2 | ||
1/2/22 3:52 | 1641095539 | click | 0 | 3 | 1 |
1/2/22 4:59 | 1641099553 | fetch | 2 | ||
1/2/22 4:59 | 1641099561 | fetch | 1 | ||
1/2/22 4:59 | 1641099568 | fetch | 1 | ||
1/2/22 4:59 | 1641099575 | fetch | 1 | ||
1/2/22 6:51 | 1641106302 | fetch | 2 | ||
1/2/22 6:51 | 1641106317 | fetch | 1 | ||
1/2/22 6:51 | 1641106319 | click | 0 | 17 | 2 |
1/3/22 6:15 | 1641190520 | fetch | 2 | ||
1/7/22 8:12 | 1641543135 | fetch | 2 | ||
1/10/22 1:09 | 1641776996 | fetch | 2 | ||
1/10/22 1:09 | 1641776997 | click | 0 | 1 | 1 |
1/10/22 1:12 | 1641777179 | fetch | 2 | ||
1/10/22 1:13 | 1641777181 | click | 0 | 2 | 1 |
只需在 window 函数中添加 partition by (case when event='click' or flag=2 then 1 end)
即可达到目的。此条件将忽略事件不是 'click' 或事件是 'fetch' 且标志不是 2.
而不是所有十五行 lag()
window 函数将只考虑以下 11 行。
time | unix_time_seconds | event | flag |
---|---|---|---|
2022-02-01 03:52:16 | 1641095536 | fetch | 2 |
2022-02-01 03:52:19 | 1641095539 | click | 0 |
2022-02-01 04:59:13 | 1641099553 | fetch | 2 |
2022-02-01 06:51:42 | 1641106302 | fetch | 2 |
2022-02-01 06:51:59 | 1641106319 | click | 0 |
2022-03-01 06:15:20 | 1641190520 | fetch | 2 |
2022-07-01 08:12:15 | 1641543135 | fetch | 2 |
2022-10-01 01:09:56 | 1641776996 | fetch | 2 |
2022-10-01 01:09:57 | 1641776997 | click | 0 |
2022-10-01 01:12:59 | 1641777179 | fetch | 2 |
0122-10-01 01:13:01 | 1641777181 | click | 0 |
架构和插入语句:
create table table_event(time datetime, unix_time_seconds int, event varchar(10), flag int);
insert into table_event values('22/2/1 3:52:16', 1641095536, 'fetch', 2);
insert into table_event values('22/2/1 3:52:19', 1641095539, 'click', 0);
insert into table_event values('22/2/1 4:59:13', 1641099553, 'fetch', 2);
insert into table_event values('22/2/1 4:59:21', 1641099561, 'fetch', 1);
insert into table_event values('22/2/1 4:59:28', 1641099568, 'fetch', 1);
insert into table_event values('22/2/1 4:59:35', 1641099575, 'fetch', 1);
insert into table_event values('22/2/1 6:51:42', 1641106302, 'fetch', 2);
insert into table_event values('22/2/1 6:51:57', 1641106317, 'fetch', 1);
insert into table_event values('22/2/1 6:51:59', 1641106319, 'click', 0);
insert into table_event values('22/3/1 6:15:20', 1641190520, 'fetch', 2);
insert into table_event values('22/7/1 8:12:15', 1641543135, 'fetch', 2);
insert into table_event values('22/10/1 1:09:56', 1641776996, 'fetch', 2);
insert into table_event values('22/10/1 1:09:57', 1641776997, 'click', 0);
insert into table_event values('22/10/1 1:12:59', 1641777179, 'fetch', 2);
insert into table_event values('122/10/1 1:13:01', 1641777181, 'click', 0);
查询:
select 时间,unix_time_seconds,事件,标志, 事件 = 'click' 的情况 然后 unix_time_seconds - lag(unix_time_seconds, 1 ) over (partition by (case when event='click' or(event='fetch' and flag=2) then 1 end) order按时间 ) 以 click_time
结尾来自 table_event 按 unix_time_seconds
排序输出:
time | unix_time_seconds | event | flag | click_time |
---|---|---|---|---|
2022-02-01 03:52:16 | 1641095536 | fetch | 2 | null |
2022-02-01 03:52:19 | 1641095539 | click | 0 | 3 |
2022-02-01 04:59:13 | 1641099553 | fetch | 2 | null |
2022-02-01 04:59:21 | 1641099561 | fetch | 1 | null |
2022-02-01 04:59:28 | 1641099568 | fetch | 1 | null |
2022-02-01 04:59:35 | 1641099575 | fetch | 1 | null |
2022-02-01 06:51:42 | 1641106302 | fetch | 2 | null |
2022-02-01 06:51:57 | 1641106317 | fetch | 1 | null |
2022-02-01 06:51:59 | 1641106319 | click | 0 | 17 |
2022-03-01 06:15:20 | 1641190520 | fetch | 2 | null |
2022-07-01 08:12:15 | 1641543135 | fetch | 2 | null |
2022-10-01 01:09:56 | 1641776996 | fetch | 2 | null |
2022-10-01 01:09:57 | 1641776997 | click | 0 | 1 |
2022-10-01 01:12:59 | 1641777179 | fetch | 2 | null |
22-10-01 01:13:01 | 1641777181 | click | 0 | 2 |
db<>fiddle here
这应该可以解决问题:
select time, unix_time_seconds,event,flag,
case when event = 'click' then
unix_time_seconds - (select top 1 unix_time_seconds from table_event y where event = 'fetch' and flag = 2 AND y.unix_time_seconds <= z.unix_time_seconds order by y.unix_time_seconds desc)
end as click_time from table_event z order by unix_time_seconds
这个问题有多种解法,这里是其中一种。
请注意使用 values 创建临时数据集。
另一种方法是使用 stack 函数
还要注意 timestamp literal 的使用,例如timestamp '2022-01-02 03:52:16'
外部 CASE 语句是为了显示 click_time 仅针对点击事件的值。
window 函数按 unix_time_seconds 对记录进行排序,每条记录取最大值 unix_time_seconds 直到这条记录(order by
在这种情况下实际上是order by ... rows between unbounded preceding and current row
).
的隐含语法
window 函数中的 CASE 语句确保我们只查看带有 2 个标志的获取事件。
with t (time, unix_time_seconds, event, flag)
as
(
select *
from values (timestamp '2022-01-02 03:52:16', 1641095536, 'fetch', 2)
,(timestamp '2022-01-02 03:52:19', 1641095539, 'click', 0)
,(timestamp '2022-01-02 04:59:13', 1641099553, 'fetch', 2)
,(timestamp '2022-01-02 04:59:21', 1641099561, 'fetch', 1)
,(timestamp '2022-01-02 04:59:28', 1641099568, 'fetch', 1)
,(timestamp '2022-01-02 04:59:35', 1641099575, 'fetch', 1)
,(timestamp '2022-01-02 06:51:42', 1641106302, 'fetch', 2)
,(timestamp '2022-01-02 06:51:57', 1641106317, 'fetch', 1)
,(timestamp '2022-01-02 06:51:59', 1641106319, 'click', 0)
,(timestamp '2022-01-03 06:15:20', 1641190520, 'fetch', 2)
,(timestamp '2022-01-07 08:12:15', 1641543135, 'fetch', 2)
,(timestamp '2022-01-10 01:09:56', 1641776996, 'fetch', 2)
,(timestamp '2022-01-10 01:09:57', 1641776997, 'click', 0)
,(timestamp '2022-01-10 01:12:59', 1641777179, 'fetch', 2)
,(timestamp '2022-01-10 01:13:01', 1641777181, 'click', 0)
)
select *
,case
when event == 'click'
then unix_time_seconds
- max(case
when event = 'fetch' and flag = 2
then unix_time_seconds
end
) over (order by unix_time_seconds)
end as click_time
from t
time | unix_time_seconds | event | flag | click_time |
---|---|---|---|---|
2022-01-02T03:52:16.000+0000 | 1641095536 | fetch | 2 | null |
2022-01-02T03:52:19.000+0000 | 1641095539 | click | 0 | 3 |
2022-01-02T04:59:13.000+0000 | 1641099553 | fetch | 2 | null |
2022-01-02T04:59:21.000+0000 | 1641099561 | fetch | 1 | null |
2022-01-02T04:59:28.000+0000 | 1641099568 | fetch | 1 | null |
2022-01-02T04:59:35.000+0000 | 1641099575 | fetch | 1 | null |
2022-01-02T06:51:42.000+0000 | 1641106302 | fetch | 2 | null |
2022-01-02T06:51:57.000+0000 | 1641106317 | fetch | 1 | null |
2022-01-02T06:51:59.000+0000 | 1641106319 | click | 0 | 17 |
2022-01-03T06:15:20.000+0000 | 1641190520 | fetch | 2 | null |
2022-01-07T08:12:15.000+0000 | 1641543135 | fetch | 2 | null |
2022-01-10T01:09:56.000+0000 | 1641776996 | fetch | 2 | null |
2022-01-10T01:09:57.000+0000 | 1641776997 | click | 0 | 1 |
2022-01-10T01:12:59.000+0000 | 1641777179 | fetch | 2 | null |
2022-01-10T01:13:01.000+0000 | 1641777181 | click | 0 | 2 |
此解决方案已在 Azure Databricks、RT 10.1 和 Apache Spark 3.2.0
上进行了测试