什么 SQL 查询可以用来通过参数值限制连续的周期,然后计算其中的datediff?

What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?

我有一个 table 的 phone 个电话,包括 user_id、call_date、城市、 其中 city 可以是 A 或 B。 它看起来像这样:

user_id call_date city
1 2021-01-01 A
1 2021-01-02 B
1 2021-01-03 B
1 2021-01-05 B
1 2021-01-10 A
1 2021-01-12 B
1 2021-01-16 A
2 2021-01-17 A
2 2021-01-20 B
2 2021-01-22 B
2 2021-01-23 A
2 2021-01-24 B
2 2021-01-26 B
2 2021-01-30 A

为此table,我们需要select每个用户在B市的所有时间段。 这些时间段以 天计算,并且 从城市 B 打出第一个电话开始,到城市 A 打出下一个电话时结束. 因此 user_id = 1 第一个周期从 2021-01-02 开始,然后从 2021-01-10 开始。每个用户可以有多个这样的时间段。

结果应该如下table:

user_id period_1 period_2
1 8 4
2 3 6

请问如何根据问题的情况限制周期,然后计算每个周期内的datediff? 谢谢

这是一个典型的 gaps and islands 问题。您需要先对连续的行进行分组,然后找到下一组的第一个 call_date。下面是 Postgres 的示例代码,通过应用适当的函数来计算天数差异,可以将其适用于另一个 DBMS。

with a (user_id, call_date, city)
as (
  select *
  from ( values
    ('1', date '2021-01-01', 'A'),
    ('1', date '2021-01-02', 'B'),
    ('1', date '2021-01-03', 'B'),
    ('1', date '2021-01-05', 'B'),
    ('1', date '2021-01-10', 'A'),
    ('1', date '2021-01-12', 'B'),
    ('1', date '2021-01-16', 'A'),
    ('2', date '2021-01-17', 'A'),
    ('2', date '2021-01-20', 'B'),
    ('2', date '2021-01-22', 'B'),
    ('2', date '2021-01-23', 'A'),
    ('2', date '2021-01-24', 'B'),
    ('2', date '2021-01-26', 'B'),
    ('2', date '2021-01-30', 'A')
  ) as t
)
, grp as (
  /*Identify groups*/
  select a.*,
    /*This is a grouping of consecutive rows:
      they will have the same difference between
      two row_numbers while the more detailed
      row_number changes, which means the attribute had changed.

    */
    dense_rank() over(
      partition by user_id
      order by call_date asc
    ) - 
    dense_rank() over(
      partition by user_id, city
      order by call_date asc
    ) as grp,
    /*Get next call date*/
    lead(call_date, 1, call_date)
      over(
        partition by user_id
        order by call_date asc
      ) as next_dt
  from a
)
select
  user_id,
  city,
  min(call_date) as dt_from,
  max(next_dt) as dt_to,
  max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from    | dt_to      | diff
:------ | :--- | :--------- | :--------- | ---:
1       | B    | 2021-01-02 | 2021-01-10 |    8
1       | B    | 2021-01-12 | 2021-01-16 |    4
2       | B    | 2021-01-20 | 2021-01-23 |    3
2       | B    | 2021-01-24 | 2021-01-30 |    6

db<>fiddle here