给定一个日期范围,我们如何将其分解为 N 个连续的子区间?

Given a date range how can we break it up into N contiguous sub-intervals?

我正在通过 API 访问一些数据,我需要在其中提供我的请求的日期范围,例如。开始='20100101',结束='20150415'。我想我会通过将日期范围分解为不重叠的时间间隔并在每个时间间隔上使用多处理来加快速度。

我的问题是我打破日期范围的方式并没有始终如一地给我预期的结果。这是我所做的:

from datetime import date

begin = '20100101'
end = '20101231'

假设我们想把它分成几个部分。首先,我将字符串更改为日期:

def get_yyyy_mm_dd(yyyymmdd):
    # given string 'yyyymmdd' return (yyyy, mm, dd)
    year = yyyymmdd[0:4]
    month = yyyymmdd[4:6]
    day = yyyymmdd[6:]
    return int(year), int(month), int(day)

y1, m1, d1 = get_yyyy_mm_dd(begin)
d1 = date(y1, m1, d1)
y2, m2, d2 = get_yyyy_mm_dd(end)
d2 = date(y2, m2, d2)

然后把这个范围再分成子区间:

def remove_tack(dates_list):
    # given a list of dates in form YYYY-MM-DD return a list of strings in form 'YYYYMMDD'
    tackless = []
    for d in dates_list:
        s = str(d)
        tackless.append(s[0:4]+s[5:7]+s[8:])
    return tackless

def divide_date(date1, date2, intervals):
    dates = [date1]
    for i in range(0, intervals):
        dates.append(dates[i] + (date2 - date1)/intervals)
    return remove_tack(dates)

使用上面的开始和结束我们得到:

listdates = divide_date(d1, d2, 4)
print listdates # ['20100101', '20100402', '20100702', '20101001', '20101231'] looks correct

但如果我改用日期:

begin = '20150101'
end = '20150228'

...

listdates = divide_date(d1, d2, 4)
print listdates # ['20150101', '20150115', '20150129', '20150212', '20150226']

二月底我错过了两天。我的应用程序不需要时间或时区,我不介意安装另一个库。

我实际上会采用不同的方法并依靠时间增量和日期相加来确定非重叠范围

实施

def date_range(start, end, intv):
    from datetime import datetime
    start = datetime.strptime(start,"%Y%m%d")
    end = datetime.strptime(end,"%Y%m%d")
    diff = (end  - start ) / intv
    for i in range(intv):
        yield (start + diff * i).strftime("%Y%m%d")
    yield end.strftime("%Y%m%d")

执行

>>> begin = '20150101'
>>> end = '20150228'
>>> list(date_range(begin, end, 4))
['20150101', '20150115', '20150130', '20150213', '20150228']

你能用 datetime.date 对象代替吗?

如果你这样做:

import datetime
begin = datetime.date(2001, 1, 1)
end = datetime.date(2010, 12, 31)

intervals = 4

date_list = []

delta = (end - begin)/4
for i in range(1, intervals + 1):
    date_list.append((begin+i*delta).strftime('%Y%m%d'))

和date_list应该有每个间隔的结束日期。

您应该将日期更改为日期时间

from datetime import date, datetime, timedelta

begin = '20150101'
end = '20150228'

def get_yyyy_mm_dd(yyyymmdd):
  # given string 'yyyymmdd' return (yyyy, mm, dd)
  year = yyyymmdd[0:4]
  month = yyyymmdd[4:6]
  day = yyyymmdd[6:]
  return int(year), int(month), int(day)

y1, m1, d1 = get_yyyy_mm_dd(begin)
d1 = datetime(y1, m1, d1)
y2, m2, d2 = get_yyyy_mm_dd(end)
d2 = datetime(y2, m2, d2)

def remove_tack(dates_list):
  # given a list of dates in form YYYY-MM-DD return a list of strings in form 'YYYYMMDD'
  tackless = []
  for d in dates_list:
    s = str(d)
    tackless.append(s[0:4]+s[5:7]+s[8:])
  return tackless

def divide_date(date1, date2, intervals):
  dates = [date1]
  delta = (date2-date1).total_seconds()/4
  for i in range(0, intervals):
    dates.append(dates[i] + timedelta(0,delta))
  return remove_tack(dates)

listdates = divide_date(d1, d2, 4)
print listdates

结果:

['20150101 00:00:00', '20150115 12:00:00', '20150130 00:00:00', '20150213 12:00:00', '20150228 00:00:00']

使用 Pandas 中的 Datetimeindex 和 Periods,以及字典理解:

import pandas as pd

begin = '20100101'
end = '20101231'

start = dt.datetime.strptime(begin, '%Y%m%d')
finish = dt.datetime.strptime(end, '%Y%m%d')

dates = pd.DatetimeIndex(start=start, end=finish, freq='D').tolist()
quarters = [d.to_period('Q') for d in dates]
df = pd.DataFrame([quarters, dates], index=['Quarter', 'Date']).T

quarterly_dates = {str(q): [ts.strftime('%Y%m%d') 
                            for ts in df[df.Quarter == q].Date.values.tolist()]
                           for q in quarters}

>>> quarterly_dates
{'2010Q1': ['20100101',
  '20100102',
  '20100103',
  '20100104',
  '20100105',
...
  '20101227',
  '20101228',
  '20101229',
  '20101230',
  '20101231']}

>>> quarterly_dates.keys()
['2010Q1', '2010Q2', '2010Q3', '2010Q4']

我创建了一个函数,其中包括日期拆分中的结束日期。


from dateutil import rrule
from dateutil.relativedelta import relativedelta
from dateutil.rrule import DAILY


def date_split(start_date, end_date, freq=DAILY, interval=1):
    """

    :param start_date:
    :param end_date:
    :param freq: refer rrule arguments can be SECONDLY, MINUTELY, HOURLY, DAILY, WEEKLY etc
    :param interval: The interval between each freq iteration.
    :return: iterator object
    """
    # remove microsecond from date object as minimum allowed frequency is in seconds.
    start_date = start_date.replace(microsecond=0)
    end_date = end_date.replace(microsecond=0)
    assert end_date > start_date, "end_date should be greated than start date."
    date_intervals = rrule.rrule(freq, interval=interval, dtstart=start_date, until=end_date)
    for date in date_intervals:
        yield date
    if date != end_date:
        yield end_date

如果您想按天数拆分日期范围。您可以使用以下代码段。

import datetime

firstDate = datetime.datetime.strptime("2019-01-01", "%Y-%m-%d")
lastDate = datetime.datetime.strptime("2019-03-30", "%Y-%m-%d")
numberOfDays = 15
startdate = firstDate
startdatelist = []
enddatelist = []

while startdate <= lastDate:
    enddate = startdate + datetime.timedelta(days=numberOfDays - 1)
    startdatelist.append(startdate.strftime("%Y-%m-%d 00:00:00"))
    if enddate > lastDate: enddatelist.append(lastDate.strftime("%Y-%m-%d 23:59:59"))
    enddatelist.append(enddate.strftime("%Y-%m-%d 23:59:59"))
    startdate = enddate + datetime.timedelta(days=1)

for a, b in zip(startdatelist, enddatelist):
    print(str(a) + "  -  " + str(b))

借用@Abhijit 的回答,这是一个版本,其中一个参数是 max_capacity_days 基于内部计算的间隔。

from datetime import datetime
from typing import Iterable

def date_groups( 
    start_at: datetime, 
    end_at: datetime,
    max_capacity_days: float) -> Iterable[datetime]:
    
    capacity = timedelta(days=max_capacity_days)
    interval = int( (end_at  - start_at ) / capacity) + 1
    for i in range(interval):
        yield (start_at + capacity * i)
    yield end_at

用法

>>> list(map(str, date_groups(datetime(2021,1,1), datetime(2021,5,1), 30))) 
['2021-01-01 00:00:00', '2021-01-31 00:00:00', '2021-03-02 00:00:00', '2021-04-01 00:00:00', '2021-05-01 00:00:00', '2021-05-01 00:00:00']

>>> list(map(str, date_groups(datetime(2021,1,1), datetime(2021,5,1), 50)))
['2021-01-01 00:00:00', '2021-02-20 00:00:00', '2021-04-11 00:00:00', '2021-05-01 00:00:00']

实际使用

对每个日期对采取行动

>>> dg = date_groups(datetime(2021,2,1, 1,33,33), datetime(2021,5,5), 30)
>>> dates = list(dg)
>>> for start_at, end_at in zip(dates[:-1],dates[1:]):
...     print(f"Delta in [{start_at}, {end_at}] = {(end_at-start_at)}")
...
Delta in [2021-02-01 01:33:33, 2021-03-03 01:33:33] = 30 days, 0:00:00
Delta in [2021-03-03 01:33:33, 2021-04-02 01:33:33] = 30 days, 0:00:00
Delta in [2021-04-02 01:33:33, 2021-05-02 01:33:33] = 30 days, 0:00:00
Delta in [2021-05-02 01:33:33, 2021-05-05 00:00:00] = 2 days, 22:26:27