为什么 DateOffset.rollback() 不能像我期望的那样在几天/几小时内工作?

Why does DateOffset.rollback() not work the way I expect it to with days / hours?

我正在尝试将任意输入 pd.Timestamp 移回正确的起始偏移,除非它已经在偏移上。

以下代码适用于 BusinessMonthEndMonthEndMonthStart 等:

import pandas as pd
from pandas.tseries.offsets import *


def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
    return freq.rollback(t)


assert to_start(pd.to_datetime("2021-09-27"), MonthEnd()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-09-27"), MonthBegin()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-08-27"), BMonthEnd()) == pd.to_datetime("2021-07-30")
assert to_start(pd.to_datetime("2021-08-27"), YearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-27"), BYearBegin()) == pd.to_datetime("2021-01-01")

# also it works nicely with holiday calendars
from pandas.tseries.holiday import USFederalHolidayCalendar

us_fed_biz_days = CustomBusinessDay(calendar=USFederalHolidayCalendar())
memorial_day = pd.to_datetime("2021-05-31")
the_friday_before_memorial_day = pd.to_datetime("2021-05-28")
assert to_start(memorial_day, us_fed_biz_days) == the_friday_before_memorial_day

然而(这让我发疯)它似乎不适用于 DayBusinessDayWeekHour 等:

assert to_start(pd.to_datetime("2021-08-27 05:00"), Day()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:00"), BDay()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:15"), Hour()) == pd.to_datetime("2021-08-27 05:00")
assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-24")

我也试过这个:

def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
        return pd.Period(t, freq=freq).start_time

具有讽刺意味的是,哪个适用于第二组断言而不适用于第一组断言?

我对上述断言的期望是否不合理,如果是这样,请问我错过了什么?

根据您的示例,我认为您想使用固定频率(例如一个小时)来降低给定的 date/time 而可变频率(例如营业月末)应该调用 .rollback 如果.is_on_offset returns 中给出的检查是正确的(请参阅问题评论部分中链接的源代码)。

例如:

def to_start(t: pd.Timestamp, freq: pd.DateOffset) -> pd.Timestamp:
    try:
        return t.floor(freq) # fixed frequencies should just floor the date/time
    except ValueError: # if freq is variable, we fall into here...
        return freq.rollback(t.floor("D"))

测试:

# variable offsets that depend on the date
assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthEnd()) == pd.to_datetime("2021-08-31")
assert to_start(pd.to_datetime("2021-09-27"), pd.tseries.offsets.MonthBegin()) == pd.to_datetime("2021-09-01")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BMonthEnd()) == pd.to_datetime("2021-07-30")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.YearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-27"), pd.tseries.offsets.BYearBegin()) == pd.to_datetime("2021-01-01")
assert to_start(pd.to_datetime("2021-08-26"), pd.tseries.frequencies.to_offset("W-MON")) == pd.to_datetime("2021-08-23")
assert to_start(pd.to_datetime("2021-08-28 05:00"), pd.tseries.offsets.BDay()) == pd.to_datetime("2021-08-27")
# fixed offsets (e.g. an hour is always an hour)
assert to_start(pd.to_datetime("2021-08-27 05:00"), pd.tseries.offsets.Day()) == pd.to_datetime("2021-08-27")
assert to_start(pd.to_datetime("2021-08-27 05:15"), pd.tseries.offsets.Hour()) == pd.to_datetime("2021-08-27 05:00")