如何使用带条件的 SQL Lag() 函数检索行

How to retrieve rows using SQL Lag() function with condition

我一直在尝试解决这个问题,但没有得到想要的结果。请帮忙,因为我已经尝试了好几天了。

下面有 table,其中 click_time 和 fetches_before_click 列是我需要的结果。

  1. 计算 unix_time_seconds 列中“点击”动作与点击之前的“提取”动作之间的时间差,标志 = 2。

  2. 计算“click”和之前“fetch”满足上述条件的“flag = 2”之间的行数。 fetches_before_click 列显示了在“点击”事件之前用户进行了多少次“提取”

click_time 列仅显示“点击”行与点击前的第一行之间的区别,该行具有标志 = 2 和事件=fetch

我使用了以下语句,但我不确定如何倒退并找到第一行“fetch”且值为“2”,然后取出该行并从点击中减去。 我使用 lag() 函数来读取前几行,但它在到达前一行时遇到了问题,该行具有 flag=2 和 event = fetch

 select time, unix_time_seconds,event,flag,
 case when event = "click"
   then unix_time_seconds - lag(unix_time_seconds, 1 ) over (order by flag desc)
 end as click_time

 from table_event
time unix_time_seconds event flag click_time fetches_before_click
1/2/22 3:52 1641095536 fetch 2
1/2/22 3:52 1641095539 click 0 3 1
1/2/22 4:59 1641099553 fetch 2
1/2/22 4:59 1641099561 fetch 1
1/2/22 4:59 1641099568 fetch 1
1/2/22 4:59 1641099575 fetch 1
1/2/22 6:51 1641106302 fetch 2
1/2/22 6:51 1641106317 fetch 1
1/2/22 6:51 1641106319 click 0 17 2
1/3/22 6:15 1641190520 fetch 2
1/7/22 8:12 1641543135 fetch 2
1/10/22 1:09 1641776996 fetch 2
1/10/22 1:09 1641776997 click 0 1 1
1/10/22 1:12 1641777179 fetch 2
1/10/22 1:13 1641777181 click 0 2 1

只需在 window 函数中添加 partition by (case when event='click' or flag=2 then 1 end) 即可达到目的。此条件将忽略事件不是 'click' 或事件是 'fetch' 且标志不是 2.

的所有行

而不是所有十五行 lag() window 函数将只考虑以下 11 行。

time unix_time_seconds event flag
2022-02-01 03:52:16 1641095536 fetch 2
2022-02-01 03:52:19 1641095539 click 0
2022-02-01 04:59:13 1641099553 fetch 2
2022-02-01 06:51:42 1641106302 fetch 2
2022-02-01 06:51:59 1641106319 click 0
2022-03-01 06:15:20 1641190520 fetch 2
2022-07-01 08:12:15 1641543135 fetch 2
2022-10-01 01:09:56 1641776996 fetch 2
2022-10-01 01:09:57 1641776997 click 0
2022-10-01 01:12:59 1641777179 fetch 2
0122-10-01 01:13:01 1641777181 click 0

架构和插入语句:

 create table table_event(time datetime,    unix_time_seconds int,  event varchar(10), flag int);
 insert into table_event values('22/2/1 3:52:16',   1641095536, 'fetch',    2);
 insert into table_event values('22/2/1 3:52:19',   1641095539, 'click',    0);
 insert into table_event values('22/2/1 4:59:13',   1641099553, 'fetch',    2);
 insert into table_event values('22/2/1 4:59:21',   1641099561, 'fetch',    1);
 insert into table_event values('22/2/1 4:59:28',   1641099568, 'fetch',    1);     
 insert into table_event values('22/2/1 4:59:35',   1641099575, 'fetch',    1);
 insert into table_event values('22/2/1 6:51:42',   1641106302, 'fetch',    2);
 insert into table_event values('22/2/1 6:51:57',   1641106317, 'fetch',    1);
 insert into table_event values('22/2/1 6:51:59',   1641106319, 'click',    0);
 insert into table_event values('22/3/1 6:15:20',   1641190520, 'fetch',    2);
 insert into table_event values('22/7/1 8:12:15',   1641543135, 'fetch',    2); 
 insert into table_event values('22/10/1 1:09:56',  1641776996, 'fetch',    2); 
 insert into table_event values('22/10/1 1:09:57',  1641776997, 'click',    0);
 insert into table_event values('22/10/1 1:12:59',  1641777179, 'fetch',    2); 
 insert into table_event values('122/10/1 1:13:01', 1641777181, 'click',    0);

查询:

select 时间,unix_time_seconds,事件,标志, 事件 = 'click' 的情况 然后 unix_time_seconds - lag(unix_time_seconds, 1 ) over (partition by (case when event='click' or(event='fetch' and flag=2) then 1 end) order按时间 ) 以 click_time

结尾

来自 table_event 按 unix_time_seconds

排序

输出:

time unix_time_seconds event flag click_time
2022-02-01 03:52:16 1641095536 fetch 2 null
2022-02-01 03:52:19 1641095539 click 0 3
2022-02-01 04:59:13 1641099553 fetch 2 null
2022-02-01 04:59:21 1641099561 fetch 1 null
2022-02-01 04:59:28 1641099568 fetch 1 null
2022-02-01 04:59:35 1641099575 fetch 1 null
2022-02-01 06:51:42 1641106302 fetch 2 null
2022-02-01 06:51:57 1641106317 fetch 1 null
2022-02-01 06:51:59 1641106319 click 0 17
2022-03-01 06:15:20 1641190520 fetch 2 null
2022-07-01 08:12:15 1641543135 fetch 2 null
2022-10-01 01:09:56 1641776996 fetch 2 null
2022-10-01 01:09:57 1641776997 click 0 1
2022-10-01 01:12:59 1641777179 fetch 2 null
22-10-01 01:13:01 1641777181 click 0 2

db<>fiddle here

这应该可以解决问题:

select time, unix_time_seconds,event,flag,
case when event = 'click' then 
    unix_time_seconds - (select top 1 unix_time_seconds from table_event y where event = 'fetch' and flag = 2 AND y.unix_time_seconds <= z.unix_time_seconds order by y.unix_time_seconds desc) 
end as click_time  from table_event z order by unix_time_seconds

这个问题有多种解法,这里是其中一种。

请注意使用 values 创建临时数据集。
另一种方法是使用 stack 函数 还要注意 timestamp literal 的使用,例如timestamp '2022-01-02 03:52:16'

外部 CASE 语句是为了显示 click_time 仅针对点击事件的值。

window 函数按 unix_time_seconds 对记录进行排序,每条记录取最大值 unix_time_seconds 直到这条记录(order by 在这种情况下实际上是order by ... rows between unbounded preceding and current row).
的隐含语法 window 函数中的 CASE 语句确保我们只查看带有 2 个标志的获取事件。

with t (time, unix_time_seconds, event, flag)
as
(
  select   *
  from     values  (timestamp '2022-01-02 03:52:16', 1641095536, 'fetch', 2)
                  ,(timestamp '2022-01-02 03:52:19', 1641095539, 'click', 0)
                  ,(timestamp '2022-01-02 04:59:13', 1641099553, 'fetch', 2)
                  ,(timestamp '2022-01-02 04:59:21', 1641099561, 'fetch', 1)
                  ,(timestamp '2022-01-02 04:59:28', 1641099568, 'fetch', 1)
                  ,(timestamp '2022-01-02 04:59:35', 1641099575, 'fetch', 1)
                  ,(timestamp '2022-01-02 06:51:42', 1641106302, 'fetch', 2)
                  ,(timestamp '2022-01-02 06:51:57', 1641106317, 'fetch', 1)
                  ,(timestamp '2022-01-02 06:51:59', 1641106319, 'click', 0)
                  ,(timestamp '2022-01-03 06:15:20', 1641190520, 'fetch', 2)
                  ,(timestamp '2022-01-07 08:12:15', 1641543135, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:09:56', 1641776996, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:09:57', 1641776997, 'click', 0)
                  ,(timestamp '2022-01-10 01:12:59', 1641777179, 'fetch', 2)
                  ,(timestamp '2022-01-10 01:13:01', 1641777181, 'click', 0)
)
select    *
         ,case 
         
             when  event == 'click' 
             
             then  unix_time_seconds 
             
                 - max(case 
                           when event = 'fetch' and flag = 2 
                           then unix_time_seconds 
                       end
                       ) over (order by unix_time_seconds)
          end as click_time
          
from      t
time unix_time_seconds event flag click_time
2022-01-02T03:52:16.000+0000 1641095536 fetch 2 null
2022-01-02T03:52:19.000+0000 1641095539 click 0 3
2022-01-02T04:59:13.000+0000 1641099553 fetch 2 null
2022-01-02T04:59:21.000+0000 1641099561 fetch 1 null
2022-01-02T04:59:28.000+0000 1641099568 fetch 1 null
2022-01-02T04:59:35.000+0000 1641099575 fetch 1 null
2022-01-02T06:51:42.000+0000 1641106302 fetch 2 null
2022-01-02T06:51:57.000+0000 1641106317 fetch 1 null
2022-01-02T06:51:59.000+0000 1641106319 click 0 17
2022-01-03T06:15:20.000+0000 1641190520 fetch 2 null
2022-01-07T08:12:15.000+0000 1641543135 fetch 2 null
2022-01-10T01:09:56.000+0000 1641776996 fetch 2 null
2022-01-10T01:09:57.000+0000 1641776997 click 0 1
2022-01-10T01:12:59.000+0000 1641777179 fetch 2 null
2022-01-10T01:13:01.000+0000 1641777181 click 0 2

此解决方案已在 Azure Databricks、RT 10.1 和 Apache Spark 3.2.0

上进行了测试