从多个日期范围中提取每周天数
Extract number of days per week from multiple date ranges
我在 PostgreSQL 10.5 中有一个 table trips
:
id start_date end_date
----------------------------
1 02/01/2019 02/03/2019
2 02/02/2019 02/03/2019
3 02/06/2019 02/07/2019
4 02/06/2019 02/14/2019
5 02/06/2019 02/06/2019
我想计算与给定周重叠的旅行天数。 table 中的行程具有包含边界。每周从星期一开始,到星期日结束。预期结果将是:
week_of days_utilized
------------------------
01/28/19 5
02/04/19 8
02/11/19 4
日历参考:
Monday 01/28/19 - Sunday 02/03/19
Monday 02/04/19 - Sunday 02/10/19
Monday 02/11/19 - Sunday 02/17/19
我知道如何用我使用的编程语言编写它,但我更喜欢在 Postgres 中这样做,但我不清楚从哪里开始...
您似乎想要 generate_series()
和一个 join
和 group by
。计算涵盖的周数:
select gs.wk, count(t.id) as num_trips
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
编辑:
我看你想要涵盖的日子。这是聚合中稍微多一点的工作:
select gs.wk, count(t.id) as num_trips,
sum( 1 +
extract(day from (least(gs.wk + interval '6 day', t.end_date) - greatest(gs.wk, t.start_date)))
) as days_utilized
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
注意:这 return 与您得到的结果不完全一样。我认为这些是正确的。
我会考虑 range types for this. Makes the computations simpler and clearer with range operators - I use overlag &&
and intersection *
below. And we can use a functional GiST or SP-GiST index 来快速查询 - 如果 table 很大。喜欢:
CREATE INDEX trip_range_idx ON trip
USING gist (daterange(start_date, end_date, '[]'));
那么你的查询可以使用这个索引:
SELECT week
, count(overlap) AS ct_trips
, sum(upper(overlap) - lower(overlap)) AS days_utilized
FROM (
SELECT week, trip * week AS overlap
FROM (
SELECT daterange(mon::date, mon::date + 7) AS week
FROM generate_series(timestamp '2019-01-28'
, timestamp '2019-02-11'
, interval '1 week') mon
) w
LEFT JOIN (SELECT daterange(start_date, end_date, '[]') FROM trip) t(trip) ON trip && week
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
请注意,默认情况下 date_range
由 包含 下限和 不包含 上限组成。 你的范围包括上限和下限,因此创建daterange
:daterange(start_date, end_date, '[]')
。函数 upper()
仍然是 returns 独占上限。因此表达式 upper(overlap) - lower(overlap)
计算天数是正确的。
我使用 generate_series()
和 timestamp
输入是有原因的:
- Generating time series between two dates in PostgreSQL
相关:
- Perform this hours of operation query in PostgreSQL
或,如果不想使用范围类型,可以考虑OVERLAPS
运算符:
- Find overlapping date ranges in PostgreSQL
我在 PostgreSQL 10.5 中有一个 table trips
:
id start_date end_date
----------------------------
1 02/01/2019 02/03/2019
2 02/02/2019 02/03/2019
3 02/06/2019 02/07/2019
4 02/06/2019 02/14/2019
5 02/06/2019 02/06/2019
我想计算与给定周重叠的旅行天数。 table 中的行程具有包含边界。每周从星期一开始,到星期日结束。预期结果将是:
week_of days_utilized
------------------------
01/28/19 5
02/04/19 8
02/11/19 4
日历参考:
Monday 01/28/19 - Sunday 02/03/19
Monday 02/04/19 - Sunday 02/10/19
Monday 02/11/19 - Sunday 02/17/19
我知道如何用我使用的编程语言编写它,但我更喜欢在 Postgres 中这样做,但我不清楚从哪里开始...
您似乎想要 generate_series()
和一个 join
和 group by
。计算涵盖的周数:
select gs.wk, count(t.id) as num_trips
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
编辑:
我看你想要涵盖的日子。这是聚合中稍微多一点的工作:
select gs.wk, count(t.id) as num_trips,
sum( 1 +
extract(day from (least(gs.wk + interval '6 day', t.end_date) - greatest(gs.wk, t.start_date)))
) as days_utilized
from generate_series('2019-01-28'::date, '2019-02-11'::date, interval '1 week') gs(wk) left join
trips t
on gs.wk <= t.end_date and
gs.wk + interval '6 day' >= t.start_date
group by gs.wk
order by gs.wk;
注意:这 return 与您得到的结果不完全一样。我认为这些是正确的。
我会考虑 range types for this. Makes the computations simpler and clearer with range operators - I use overlag &&
and intersection *
below. And we can use a functional GiST or SP-GiST index 来快速查询 - 如果 table 很大。喜欢:
CREATE INDEX trip_range_idx ON trip
USING gist (daterange(start_date, end_date, '[]'));
那么你的查询可以使用这个索引:
SELECT week
, count(overlap) AS ct_trips
, sum(upper(overlap) - lower(overlap)) AS days_utilized
FROM (
SELECT week, trip * week AS overlap
FROM (
SELECT daterange(mon::date, mon::date + 7) AS week
FROM generate_series(timestamp '2019-01-28'
, timestamp '2019-02-11'
, interval '1 week') mon
) w
LEFT JOIN (SELECT daterange(start_date, end_date, '[]') FROM trip) t(trip) ON trip && week
) sub
GROUP BY 1
ORDER BY 1;
db<>fiddle here
请注意,默认情况下 date_range
由 包含 下限和 不包含 上限组成。 你的范围包括上限和下限,因此创建daterange
:daterange(start_date, end_date, '[]')
。函数 upper()
仍然是 returns 独占上限。因此表达式 upper(overlap) - lower(overlap)
计算天数是正确的。
我使用 generate_series()
和 timestamp
输入是有原因的:
- Generating time series between two dates in PostgreSQL
相关:
- Perform this hours of operation query in PostgreSQL
或,如果不想使用范围类型,可以考虑OVERLAPS
运算符:
- Find overlapping date ranges in PostgreSQL