pandas 时间序列偏移量 Business MonthBegin 不会在新月份滚动
pandas timeseries offset BusinessMonthBegin doesn't roll over on the new month
我有几个 pandas 函数用于收集以前时间段的相对开始日期。我今天注意到在新的一个月开始时,我的业务月开始 (BMS
) 函数 returned 了一个意外的时间戳:
# so.py
import pandas
import time
def now(format='ms', normalize=True):
obj = pandas.Timestamp.now(tz='America/Toronto').normalize()
if normalize == False:
obj = pandas.Timestamp.now(tz='America/Toronto')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
def BMS(multiplier, format='ms'):
obj = now(format=None) + pandas.tseries.offsets.BusinessMonthBegin(multiplier)
obj = pandas.Timestamp(obj).floor(freq='D')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
print(f'my function: {BMS(-4, format=None)}')
# python3 so.py
2021-10-01 00:00:00-04:00
#
2021-10-01 00:00:00-04:00
出乎意料,因为此时间戳与昨天 returned 的时间戳相同:
yesterday = pandas.Timestamp.now(tz='America/Toronto').normalize() - pandas.Timedelta(days=1)
print(f'yesterday: {yesterday + pandas.tseries.offsets.BusinessMonthBegin(-4)}')
# yesterday: 2021-10-01 00:00:00-04:00
因为今天是新的一个月,我预计 BMS(-4, format=None)
到 return
2021-11-01 00:00:00-04:00
如果可能有必要,一个更基本的 mre
来重现我的函数正在做的事情是这样的:
# MRE
today = pandas.Timestamp.now(tz='America/Toronto').normalize()
print(f'mre: {today + pandas.tseries.offsets.BusinessMonthBegin(-4)}')
更新
今天早上,mre
return 编辑了预期的时间戳
2021-11-01 00:00:00-04:00
由于它是在该月的第二天而不是该月的第一天滚动的,因此在计算 BusinessMonthBegin
时可能会隐式包含该月的第一天?
我错过了什么?
如果日期落在偏移量上,则偏移量加法已经给出了前一个b月开始日期(例如2022-02-01是一个营业月开始日期):
import pandas as pd
t_on_offset = pd.Timestamp('2022-02-01')
t_after_offset = pd.Timestamp('2022-02-02')
## on the offset, the offset addition will go back one month already:
t_on_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-01-03 00:00:00')
# it seems what you actually want here is
# t_on_offset + pd.tseries.offsets.BusinessMonthBegin(0)
# this just rolls back to the beginning of the BM:
t_after_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-02-01 00:00:00')
你可以检查你是否在像
这样的偏移量上
pd.tseries.offsets.BusinessMonthBegin().rollback(t_on_offset) == t_on_offset
# True
pd.tseries.offsets.BusinessMonthBegin().rollback(t_after_offset) == t_after_offset
# False
所以在您的 BMS 函数示例中(稍微重构),它可能看起来像
def BMS(timestamp, multiplier, normalize=True, format='ms'):
if pd.tseries.offsets.BusinessMonthBegin().rollback(timestamp) == timestamp:
if multiplier < 0:
multiplier += 1
obj = timestamp + pd.tseries.offsets.BusinessMonthBegin(multiplier)
if normalize:
obj = obj.normalize()
if format == 'ms':
return obj.timestamp() * 1000
return(obj)
进行中:
for t in pd.Timestamp('2022-01-31'), pd.Timestamp('2022-02-01'), pd.Timestamp('2022-02-02'):
print(f'{str(t)} -> my function: {BMS(-4, t, format=None)}')
2022-01-31 00:00:00 -> my function: 2021-10-01 00:00:00
2022-02-01 00:00:00 -> my function: 2021-11-01 00:00:00
2022-02-02 00:00:00 -> my function: 2021-11-01 00:00:00
我有几个 pandas 函数用于收集以前时间段的相对开始日期。我今天注意到在新的一个月开始时,我的业务月开始 (BMS
) 函数 returned 了一个意外的时间戳:
# so.py
import pandas
import time
def now(format='ms', normalize=True):
obj = pandas.Timestamp.now(tz='America/Toronto').normalize()
if normalize == False:
obj = pandas.Timestamp.now(tz='America/Toronto')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
def BMS(multiplier, format='ms'):
obj = now(format=None) + pandas.tseries.offsets.BusinessMonthBegin(multiplier)
obj = pandas.Timestamp(obj).floor(freq='D')
if format == 'ms':
obj = int(time.mktime(obj.timetuple()) * 1000)
return(obj)
print(f'my function: {BMS(-4, format=None)}')
# python3 so.py
2021-10-01 00:00:00-04:00
#
2021-10-01 00:00:00-04:00
出乎意料,因为此时间戳与昨天 returned 的时间戳相同:
yesterday = pandas.Timestamp.now(tz='America/Toronto').normalize() - pandas.Timedelta(days=1)
print(f'yesterday: {yesterday + pandas.tseries.offsets.BusinessMonthBegin(-4)}')
# yesterday: 2021-10-01 00:00:00-04:00
因为今天是新的一个月,我预计 BMS(-4, format=None)
到 return
2021-11-01 00:00:00-04:00
如果可能有必要,一个更基本的 mre
来重现我的函数正在做的事情是这样的:
# MRE
today = pandas.Timestamp.now(tz='America/Toronto').normalize()
print(f'mre: {today + pandas.tseries.offsets.BusinessMonthBegin(-4)}')
更新
今天早上,mre
return 编辑了预期的时间戳
2021-11-01 00:00:00-04:00
由于它是在该月的第二天而不是该月的第一天滚动的,因此在计算 BusinessMonthBegin
时可能会隐式包含该月的第一天?
我错过了什么?
如果日期落在偏移量上,则偏移量加法已经给出了前一个b月开始日期(例如2022-02-01是一个营业月开始日期):
import pandas as pd
t_on_offset = pd.Timestamp('2022-02-01')
t_after_offset = pd.Timestamp('2022-02-02')
## on the offset, the offset addition will go back one month already:
t_on_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-01-03 00:00:00')
# it seems what you actually want here is
# t_on_offset + pd.tseries.offsets.BusinessMonthBegin(0)
# this just rolls back to the beginning of the BM:
t_after_offset + pd.tseries.offsets.BusinessMonthBegin(-1)
# Timestamp('2022-02-01 00:00:00')
你可以检查你是否在像
这样的偏移量上pd.tseries.offsets.BusinessMonthBegin().rollback(t_on_offset) == t_on_offset
# True
pd.tseries.offsets.BusinessMonthBegin().rollback(t_after_offset) == t_after_offset
# False
所以在您的 BMS 函数示例中(稍微重构),它可能看起来像
def BMS(timestamp, multiplier, normalize=True, format='ms'):
if pd.tseries.offsets.BusinessMonthBegin().rollback(timestamp) == timestamp:
if multiplier < 0:
multiplier += 1
obj = timestamp + pd.tseries.offsets.BusinessMonthBegin(multiplier)
if normalize:
obj = obj.normalize()
if format == 'ms':
return obj.timestamp() * 1000
return(obj)
进行中:
for t in pd.Timestamp('2022-01-31'), pd.Timestamp('2022-02-01'), pd.Timestamp('2022-02-02'):
print(f'{str(t)} -> my function: {BMS(-4, t, format=None)}')
2022-01-31 00:00:00 -> my function: 2021-10-01 00:00:00
2022-02-01 00:00:00 -> my function: 2021-11-01 00:00:00
2022-02-02 00:00:00 -> my function: 2021-11-01 00:00:00