什么 SQL 查询可以用来通过参数值限制连续的周期,然后计算其中的datediff?
What SQL query can be used to limit continious periods by parameter value, and then to calculate datediff inside them?
我有一个 table 的 phone 个电话,包括 user_id、call_date、城市、
其中 city 可以是 A 或 B。
它看起来像这样:
user_id
call_date
city
1
2021-01-01
A
1
2021-01-02
B
1
2021-01-03
B
1
2021-01-05
B
1
2021-01-10
A
1
2021-01-12
B
1
2021-01-16
A
2
2021-01-17
A
2
2021-01-20
B
2
2021-01-22
B
2
2021-01-23
A
2
2021-01-24
B
2
2021-01-26
B
2
2021-01-30
A
为此table,我们需要select每个用户在B市的所有时间段。
这些时间段以 天计算,并且 从城市 B 打出第一个电话开始,到城市 A 打出下一个电话时结束.
因此 user_id = 1 第一个周期从 2021-01-02 开始,然后从 2021-01-10 开始。每个用户可以有多个这样的时间段。
结果应该如下table:
user_id
period_1
period_2
1
8
4
2
3
6
请问如何根据问题的情况限制周期,然后计算每个周期内的datediff?
谢谢
这是一个典型的 gaps and islands 问题。您需要先对连续的行进行分组,然后找到下一组的第一个 call_date
。下面是 Postgres 的示例代码,通过应用适当的函数来计算天数差异,可以将其适用于另一个 DBMS。
with a (user_id, call_date, city)
as (
select *
from ( values
('1', date '2021-01-01', 'A'),
('1', date '2021-01-02', 'B'),
('1', date '2021-01-03', 'B'),
('1', date '2021-01-05', 'B'),
('1', date '2021-01-10', 'A'),
('1', date '2021-01-12', 'B'),
('1', date '2021-01-16', 'A'),
('2', date '2021-01-17', 'A'),
('2', date '2021-01-20', 'B'),
('2', date '2021-01-22', 'B'),
('2', date '2021-01-23', 'A'),
('2', date '2021-01-24', 'B'),
('2', date '2021-01-26', 'B'),
('2', date '2021-01-30', 'A')
) as t
)
, grp as (
/*Identify groups*/
select a.*,
/*This is a grouping of consecutive rows:
they will have the same difference between
two row_numbers while the more detailed
row_number changes, which means the attribute had changed.
*/
dense_rank() over(
partition by user_id
order by call_date asc
) -
dense_rank() over(
partition by user_id, city
order by call_date asc
) as grp,
/*Get next call date*/
lead(call_date, 1, call_date)
over(
partition by user_id
order by call_date asc
) as next_dt
from a
)
select
user_id,
city,
min(call_date) as dt_from,
max(next_dt) as dt_to,
max(next_dt) - min(call_date) as diff
from grp
where city = 'B'
group by user_id, grp, city
order by 1, 3
user_id | city | dt_from | dt_to | diff
:------ | :--- | :--------- | :--------- | ---:
1 | B | 2021-01-02 | 2021-01-10 | 8
1 | B | 2021-01-12 | 2021-01-16 | 4
2 | B | 2021-01-20 | 2021-01-23 | 3
2 | B | 2021-01-24 | 2021-01-30 | 6
db<>fiddle here
我有一个 table 的 phone 个电话,包括 user_id、call_date、城市、 其中 city 可以是 A 或 B。 它看起来像这样:
user_id | call_date | city |
---|---|---|
1 | 2021-01-01 | A |
1 | 2021-01-02 | B |
1 | 2021-01-03 | B |
1 | 2021-01-05 | B |
1 | 2021-01-10 | A |
1 | 2021-01-12 | B |
1 | 2021-01-16 | A |
2 | 2021-01-17 | A |
2 | 2021-01-20 | B |
2 | 2021-01-22 | B |
2 | 2021-01-23 | A |
2 | 2021-01-24 | B |
2 | 2021-01-26 | B |
2 | 2021-01-30 | A |
为此table,我们需要select每个用户在B市的所有时间段。 这些时间段以 天计算,并且 从城市 B 打出第一个电话开始,到城市 A 打出下一个电话时结束. 因此 user_id = 1 第一个周期从 2021-01-02 开始,然后从 2021-01-10 开始。每个用户可以有多个这样的时间段。
结果应该如下table:
user_id | period_1 | period_2 |
---|---|---|
1 | 8 | 4 |
2 | 3 | 6 |
请问如何根据问题的情况限制周期,然后计算每个周期内的datediff? 谢谢
这是一个典型的 gaps and islands 问题。您需要先对连续的行进行分组,然后找到下一组的第一个 call_date
。下面是 Postgres 的示例代码,通过应用适当的函数来计算天数差异,可以将其适用于另一个 DBMS。
with a (user_id, call_date, city) as ( select * from ( values ('1', date '2021-01-01', 'A'), ('1', date '2021-01-02', 'B'), ('1', date '2021-01-03', 'B'), ('1', date '2021-01-05', 'B'), ('1', date '2021-01-10', 'A'), ('1', date '2021-01-12', 'B'), ('1', date '2021-01-16', 'A'), ('2', date '2021-01-17', 'A'), ('2', date '2021-01-20', 'B'), ('2', date '2021-01-22', 'B'), ('2', date '2021-01-23', 'A'), ('2', date '2021-01-24', 'B'), ('2', date '2021-01-26', 'B'), ('2', date '2021-01-30', 'A') ) as t ) , grp as ( /*Identify groups*/ select a.*, /*This is a grouping of consecutive rows: they will have the same difference between two row_numbers while the more detailed row_number changes, which means the attribute had changed. */ dense_rank() over( partition by user_id order by call_date asc ) - dense_rank() over( partition by user_id, city order by call_date asc ) as grp, /*Get next call date*/ lead(call_date, 1, call_date) over( partition by user_id order by call_date asc ) as next_dt from a ) select user_id, city, min(call_date) as dt_from, max(next_dt) as dt_to, max(next_dt) - min(call_date) as diff from grp where city = 'B' group by user_id, grp, city order by 1, 3
user_id | city | dt_from | dt_to | diff :------ | :--- | :--------- | :--------- | ---: 1 | B | 2021-01-02 | 2021-01-10 | 8 1 | B | 2021-01-12 | 2021-01-16 | 4 2 | B | 2021-01-20 | 2021-01-23 | 3 2 | B | 2021-01-24 | 2021-01-30 | 6
db<>fiddle here