带有关键参数的 min 函数可以搜索日期 - 有更快的方法吗?
min function with key parameter to search through dates - is there a faster way?
我正在使用以下代码搜索日期,以便在日期列表中找到最近的前一个日期:
def nearest_previous_date(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if none satisfy criteria returns
first date in list
"""
return min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
我需要多次调用此函数,因此我希望它尽可能高效,目前搜索 23 个日期的列表并找到相关日期大约需要 200 微秒。听起来不是很多,但这并不能很好地扩展。有没有办法让这个功能更高效?
这是一个例子
pivot_date = datetime(day=21, month=7, year=2019)
list_of_dates
DatetimeIndex(['2015-06-30', '2015-09-30', '2015-12-31', '2016-03-31',
'2016-06-30', '2016-09-30', '2016-12-30', '2017-03-31',
'2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30',
'2018-06-30', '2018-10-01', '2019-01-01', '2019-03-29',
'2019-07-01', '2019-10-01', '2019-12-31', '2020-03-31',
'2020-06-30', '2020-09-30', '2020-12-31'],
dtype='datetime64[ns]', name='effectiveDate', freq=None)
%%timeit
min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
191 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
由于datetime
对象可以订购,参考日期之前的最近日期确实是参考日期之前的“最大”日期:
def nearest_previous_date(list_of_dates, pivot_date):
return max((date for date in list_of_dates if date <= pivot_date), default=list_of_dates[0])
假设列表是有序的,那么可以采用二分查找,扩展性更好:
from bisect import bisect
def nearest_previous_date(list_of_dates, pivot_date):
return list_of_dates[max(bisect(list_of_dates, pivot_date) - 1, 0)]
@jasonharper 提出的解决方案
def nearest_previous_date_NEW(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Important: assumes list_of_dates is sorted ascending
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if
none satisfy criteria returns first date in list
"""
return list_of_dates[max(0, bisect.bisect_left(list_of_dates, pivot_date)-1)]
确实快多了:
每个循环 47.4 µs ± 1.84 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)
我正在使用以下代码搜索日期,以便在日期列表中找到最近的前一个日期:
def nearest_previous_date(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if none satisfy criteria returns
first date in list
"""
return min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
我需要多次调用此函数,因此我希望它尽可能高效,目前搜索 23 个日期的列表并找到相关日期大约需要 200 微秒。听起来不是很多,但这并不能很好地扩展。有没有办法让这个功能更高效?
这是一个例子
pivot_date = datetime(day=21, month=7, year=2019)
list_of_dates
DatetimeIndex(['2015-06-30', '2015-09-30', '2015-12-31', '2016-03-31',
'2016-06-30', '2016-09-30', '2016-12-30', '2017-03-31',
'2017-06-30', '2017-09-29', '2017-12-29', '2018-03-30',
'2018-06-30', '2018-10-01', '2019-01-01', '2019-03-29',
'2019-07-01', '2019-10-01', '2019-12-31', '2020-03-31',
'2020-06-30', '2020-09-30', '2020-12-31'],
dtype='datetime64[ns]', name='effectiveDate', freq=None)
%%timeit
min(list_of_dates, key=lambda x: (pivot_date - x).days if x <= pivot_date else float("inf"))
191 µs ± 5.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
由于datetime
对象可以订购,参考日期之前的最近日期确实是参考日期之前的“最大”日期:
def nearest_previous_date(list_of_dates, pivot_date):
return max((date for date in list_of_dates if date <= pivot_date), default=list_of_dates[0])
假设列表是有序的,那么可以采用二分查找,扩展性更好:
from bisect import bisect
def nearest_previous_date(list_of_dates, pivot_date):
return list_of_dates[max(bisect(list_of_dates, pivot_date) - 1, 0)]
@jasonharper 提出的解决方案
def nearest_previous_date_NEW(list_of_dates, pivot_date):
""" Helper function to find the nearest previous date in a list of dates
Important: assumes list_of_dates is sorted ascending
Args:
list_of_dates (list): list of datetime objects
pivot_date (datetime): reference date
Returns:
(datetime): datetime immediately before or equal to reference date, if
none satisfy criteria returns first date in list
"""
return list_of_dates[max(0, bisect.bisect_left(list_of_dates, pivot_date)-1)]
确实快多了:
每个循环 47.4 µs ± 1.84 µs(7 次运行的平均值 ± 标准偏差,每次 10000 次循环)