pandas 时间序列偏移量 Business MonthBegin 不会在新月份滚动

pandas timeseries offset BusinessMonthBegin doesn't roll over on the new month

我有几个 pandas 函数用于收集以前时间段的相对开始日期。我今天注意到在新的一个月开始时,我的业务月开始 (BMS) 函数 returned 了一个意外的时间戳:

# so.py
import pandas
import time

def now(format='ms', normalize=True):
    obj = pandas.Timestamp.now(tz='America/Toronto').normalize()
    if normalize == False:
        obj = pandas.Timestamp.now(tz='America/Toronto')
    if format == 'ms':
        obj = int(time.mktime(obj.timetuple()) * 1000)
    return(obj)

def BMS(multiplier, format='ms'):
    obj = now(format=None) + pandas.tseries.offsets.BusinessMonthBegin(multiplier)
    obj = pandas.Timestamp(obj).floor(freq='D')
    if format == 'ms':
        obj = int(time.mktime(obj.timetuple()) * 1000)
    return(obj)

print(f'my function: {BMS(-4, format=None)}')

# python3 so.py
2021-10-01 00:00:00-04:00
#

2021-10-01 00:00:00-04:00 出乎意料,因为此时间戳与昨天 returned 的时间戳相同:


yesterday = pandas.Timestamp.now(tz='America/Toronto').normalize() - pandas.Timedelta(days=1)
print(f'yesterday: {yesterday + pandas.tseries.offsets.BusinessMonthBegin(-4)}')

# yesterday: 2021-10-01 00:00:00-04:00

因为今天是新的一个月,我预计 BMS(-4, format=None) 到 return 2021-11-01 00:00:00-04:00

如果可能有必要,一个更基本的 mre 来重现我的函数正在做的事情是这样的:

# MRE
today = pandas.Timestamp.now(tz='America/Toronto').normalize()
print(f'mre: {today + pandas.tseries.offsets.BusinessMonthBegin(-4)}')

更新 今天早上,mre return 编辑了预期的时间戳

2021-11-01 00:00:00-04:00

由于它是在该月的第二天而不是该月的第一天滚动的,因此在计算 BusinessMonthBegin 时可能会隐式包含该月的第一天?

我错过了什么?

如果日期落在偏移量上,则偏移量加法已经给出了前一个b月开始日期(例如2022-02-01是一个营业月开始日期):

import pandas as pd

t_on_offset = pd.Timestamp('2022-02-01')
t_after_offset = pd.Timestamp('2022-02-02')

## on the offset, the offset addition will go back one month already:
t_on_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-01-03 00:00:00')

# it seems what you actually want here is
# t_on_offset + pd.tseries.offsets.BusinessMonthBegin(0)

# this just rolls back to the beginning of the BM:
t_after_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-02-01 00:00:00')

你可以检查你是否在像

这样的偏移量上
pd.tseries.offsets.BusinessMonthBegin().rollback(t_on_offset) == t_on_offset
# True

pd.tseries.offsets.BusinessMonthBegin().rollback(t_after_offset) == t_after_offset
# False

所以在您的 BMS 函数示例中(稍微重构),它可能看起来像

def BMS(timestamp, multiplier, normalize=True, format='ms'):
    if pd.tseries.offsets.BusinessMonthBegin().rollback(timestamp) == timestamp:
        if multiplier < 0:
            multiplier += 1
    obj = timestamp + pd.tseries.offsets.BusinessMonthBegin(multiplier)
    
    if normalize:
        obj = obj.normalize()

    if format == 'ms':
        return obj.timestamp() * 1000

    return(obj)

进行中:

for t in pd.Timestamp('2022-01-31'), pd.Timestamp('2022-02-01'), pd.Timestamp('2022-02-02'):
    print(f'{str(t)} -> my function: {BMS(-4, t, format=None)}')
    
2022-01-31 00:00:00 -> my function: 2021-10-01 00:00:00
2022-02-01 00:00:00 -> my function: 2021-11-01 00:00:00
2022-02-02 00:00:00 -> my function: 2021-11-01 00:00:00