带有关键参数的 min 函数可以搜索日期 - 有更快的方法吗?

min function with key parameter to search through dates - is there a faster way?

我正在使用以下代码搜索日期,以便在日期列表中找到最近的前一个日期:

def nearest_previous_date(list_of_dates, pivot_date):
    """ Helper function to find the nearest previous date in a list of dates

    Args:
        list_of_dates (list): list of datetime objects
        pivot_date (datetime): reference date
    
    Returns:
        (datetime): datetime immediately before or equal to reference date, if none satisfy criteria returns
        first date in list
    """
    return min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))

我需要多次调用此函数,因此我希望它尽可能高效,目前搜索 23 个日期的列表并找到相关日期大约需要 200 微秒。听起来不是很多,但这并不能很好地扩展。有没有办法让这个功能更高效?

这是一个例子

pivot_date = datetime(day=21, month=7, year=2019)
list_of_dates
DatetimeIndex(['2015-06-30', '2015-09-30', '2015-12-31', '2016-03-31',
           '2016-06-30', '2016-09-30', '2016-12-30', '2017-03-31',
           '2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30',
           '2018-06-30', '2018-10-01', '2019-01-01', '2019-03-29',
           '2019-07-01', '2019-10-01', '2019-12-31', '2020-03-31',
           '2020-06-30', '2020-09-30', '2020-12-31'],
          dtype='datetime64[ns]', name='effectiveDate', freq=None)

%%timeit
min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))

191 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

由于datetime对象可以订购,参考日期之前的最近日期确实是参考日期之前的“最大”日期:

def nearest_previous_date(list_of_dates, pivot_date):
    return max((date for date in list_of_dates if date <= pivot_date), default=list_of_dates[0])

假设列表是有序的,那么可以采用二分查找,扩展性更好:

from bisect import bisect

def nearest_previous_date(list_of_dates, pivot_date):
    return list_of_dates[max(bisect(list_of_dates, pivot_date) - 1, 0)]

@jasonharper 提出的解决方案

def nearest_previous_date_NEW(list_of_dates, pivot_date):
    """ Helper function to find the nearest previous date in a list of dates
    Important: assumes list_of_dates is sorted ascending

    Args:
       list_of_dates (list): list of datetime objects
       pivot_date (datetime): reference date
    
    Returns:
        (datetime): datetime immediately before or equal to reference date, if 
        none satisfy criteria returns first date in list
    """
    return list_of_dates[max(0, bisect.bisect_left(list_of_dates, pivot_date)-1)]

确实快多了:

每个循环 47.4 µs ± 1.84 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)