什么 SQL 查询可以用来通过参数值限制连续的周期，然后计算其中的datediff？

Question

我有一个 table 的 phone 个电话，包括 user_id、call_date、城市、其中 city 可以是 A 或 B。它看起来像这样：

user_id	call_date	city
1	2021-01-01	A
1	2021-01-02	B
1	2021-01-03	B
1	2021-01-05	B
1	2021-01-10	A
1	2021-01-12	B
1	2021-01-16	A
2	2021-01-17	A
2	2021-01-20	B
2	2021-01-22	B
2	2021-01-23	A
2	2021-01-24	B
2	2021-01-26	B
2	2021-01-30	A

为此table，我们需要select每个用户在B市的所有时间段。这些时间段以 天计算，并且 从城市 B 打出第一个电话开始，到城市 A 打出下一个电话时结束. 因此 user_id = 1 第一个周期从 2021-01-02 开始，然后从 2021-01-10 开始。每个用户可以有多个这样的时间段。

结果应该如下table:

user_id	period_1	period_2
1	8	4
2	3	6

请问如何根据问题的情况限制周期，然后计算每个周期内的datediff？谢谢

Answer 1

这是一个典型的 gaps and islands 问题。您需要先对连续的行进行分组，然后找到下一组的第一个 call_date。下面是 Postgres 的示例代码，通过应用适当的函数来计算天数差异，可以将其适用于另一个 DBMS。

with a (user_id, call_date, city)
as (
  select *
  from ( values
    ('1', date '2021-01-01', 'A'),
    ('1', date '2021-01-02', 'B'),
    ('1', date '2021-01-03', 'B'),
    ('1', date '2021-01-05', 'B'),
    ('1', date '2021-01-10', 'A'),
    ('1', date '2021-01-12', 'B'),
    ('1', date '2021-01-16', 'A'),
    ('2', date '2021-01-17', 'A'),
    ('2', date '2021-01-20', 'B'),
    ('2', date '2021-01-22', 'B'),
    ('2', date '2021-01-23', 'A'),
    ('2', date '2021-01-24', 'B'),
    ('2', date '2021-01-26', 'B'),
    ('2', date '2021-01-30', 'A')
  ) as t
)
, grp as (
  /*Identify groups*/
  select a.*,
    /*This is a grouping of consecutive rows:
      they will have the same difference between
      two row_numbers while the more detailed
      row_number changes, which means the attribute had changed.

    */
    dense_rank() over(
      partition by user_id
      order by call_date asc
    ) - 
    dense_rank() over(
      partition by user_id, city
      order by call_date asc
    ) as grp,
    /*Get next call date*/
    lead(call_date, 1, call_date)
      over(
        partition by user_id
        order by call_date asc
      ) as next_dt
  from a
)
select
  user_id,
  city,
  min(call_date) as dt_from,
  max(next_dt) as dt_to,
  max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3

user_id | city | dt_from    | dt_to      | diff
:------ | :--- | :--------- | :--------- | ---:
1       | B    | 2021-01-02 | 2021-01-10 |    8
1       | B    | 2021-01-12 | 2021-01-16 |    4
2       | B    | 2021-01-20 | 2021-01-23 |    3
2       | B    | 2021-01-24 | 2021-01-30 |    6

db<>fiddle here

什么 SQL 查询可以用来通过参数值限制连续的周期，然后计算其中的datediff？

What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

sql

datediff

gaps-and-islands