获取连续 3 个或更多日期乘车的用户

Get users who took ride for 3 or more consecutive dates

我下面有 table,它显示 user_idride_date

+---------+------------+
| user_id | ride_date  |
+---------+------------+
|       1 | 2019-11-01 |
|       1 | 2019-11-03 |
|       1 | 2019-11-05 |
|       2 | 2019-11-03 |
|       2 | 2019-11-04 |
|       2 | 2019-11-05 |
|       2 | 2019-11-06 |
|       3 | 2019-11-03 |
|       3 | 2019-11-04 |
|       3 | 2019-11-05 |
|       3 | 2019-11-06 |
|       4 | 2019-11-05 |
|       4 | 2019-11-07 |
|       4 | 2019-11-08 |
|       4 | 2019-11-09 |
|       5 | 2019-11-11 |
|       5 | 2019-11-13 |
+---------+------------+

我想要 user_id 连续 3 天或更多天乘车的人以及他们连续乘车的天数

想要的结果如下

+---------+-----------------------+
| user_id | consecutive_ride_date |
+---------+-----------------------+
|       2 | 2019-11-03            |
|       2 | 2019-11-04            |
|       2 | 2019-11-05            |
|       2 | 2019-11-06            |
|       3 | 2019-11-03            |
|       3 | 2019-11-04            |
|       3 | 2019-11-05            |
|       3 | 2019-11-06            |
|       4 | 2019-11-08            |
|       4 | 2019-11-09            |
|       4 | 2019-11-10            |
+---------+-----------------------+

SQL Fiddle

这是典型的空隙孤岛问题

我们可以这样解决

with data
  as ( 
      select user_id
             ,ride_date
             ,dateadd(day
                      ,-row_number() over(partition by user_id order by ride_date asc)
                      ,ride_date) as grp_field
        from Table1
       )
     ,consecutive_days
     as(
  select user_id
         ,ride_date
         ,count(*) over(partition by user_id,grp_field) as cnt
    from data
       )
 select *
 from consecutive_days
 where cnt>=3
 order by user_id,ride_date

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=7bb851d9a12966b54afb4d8b144f3d46

这里有一种方法可以解决这个间隙和孤岛问题:

  • 首先,给每个用row_number()的用户骑行分配一个rank,恢复之前的ride_date(别名lag_ride_date

  • 然后,将上一次骑行的日期与当前骑行的日期进行条件总和比较,当日期连续时,该总和增加;通过将其与用户骑行的排名进行比较,您将获得代表间隔为 1 天的连续骑行的组(别名 grp

  • 做一个window计算每个组有多少条记录(别名cnt

  • 过滤 window 计数大于 3

  • 的记录

查询:

select user_id, ride_date
from (
    select 
        t.*,
        count(*) over(partition by user_id, grp) cnt
    from (
        select
            t.*,
            rn1 
                - sum(case when ride_date = dateadd(day, 1, lag_ride_date) then 1 else 0 end)
                over(partition by user_id order by ride_date) grp
        from (
            select 
                t.*,
                row_number() over(partition by user_id order by ride_date) rn1,
                lag(ride_date) over(partition by user_id order by ride_date) lag_ride_date
            from Table1 t
        ) t
    ) t
) t
where cnt >= 3

Demo on DB Fiddle

使用 LAG()LEAD() window 函数:

with cte as (
  select *,
    datediff(
      day,
      lag([ride_date]) over (partition by [user_id] order by [ride_date]),
      [ride_date]
    ) prev1,
    datediff(
      day,
      lag([ride_date], 2) over (partition by [user_id] order by [ride_date]),
      [ride_date]
    ) prev2,
    datediff(
      day,
      [ride_date],
      lead([ride_date]) over (partition by [user_id] order by [ride_date])
    ) next1,
    datediff(
      day,
      [ride_date],
      lead([ride_date], 2) over (partition by [user_id] order by [ride_date])
    ) next2
  from Table1  
)
select [user_id], [ride_date]
from cte
where 
  (prev1 = 1 and prev2 = 2) or
  (prev1 = 1 and next1 = 1) or
  (next1 = 1 and next2 = 2)

参见demo
结果:

> user_id | ride_date          
> ------: | :---------
>       2 | 03/11/2019
>       2 | 04/11/2019
>       2 | 05/11/2019
>       2 | 06/11/2019
>       3 | 03/11/2019
>       3 | 04/11/2019
>       3 | 05/11/2019
>       3 | 06/11/2019
>       4 | 07/11/2019
>       4 | 08/11/2019
>       4 | 09/11/2019

没有必要对这个问题应用间隙和孤岛方法。问题解决起来就简单多了。

您可以 return 用户和第一次约会只需使用 LEAD():

SELECT t1.*
FROM (SELECT t1.*,
             LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
      FROM table1 t1
     ) t1
WHERE ride_date_2 = DATEADD(day, 2, ride_date);

如果您想要实际日期,可以对结果进行逆透视:

SELECT DISTINCT t1.user_id, v.ride_date
FROM (SELECT t1.*,
             LEAD(ride_date, 2) OVER (PARTITION BY user_id ORDER BY ride_date) as ride_date_2
      FROM table1 t1
     ) t1 CROSS APPLY
     (VALUES (t1.ride_date),
             (DATEADD(day, 1, t1.ride_date)),
             (DATEADD(day, 2, t1.ride_date))
     ) v(ride_date)
WHERE t1.ride_date_2 = DATEADD(day, 2, t1.ride_date)
ORDER BY t1.user_id, v.ride_date;