在 pandas 中查找前一个交易日非常慢
Finding previous trading day in pandas is very slow
我有这个df
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
我需要获取前一个交易日。我设法用 pandas_market_calendars
python 包来做到这一点。这是我的全部代码
import io
import pandas_market_calendars as mcal
from pandas.tseries.offsets import CustomBusinessDay
import datetime
import pandas as pd
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
df = pd.read_fwf(io.StringIO(data1_txt))
df['date'] = pd.to_datetime(df['date'])
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays()
holidays = list(holidays.holidays)
US_BUSINESS_DAY = CustomBusinessDay(holidays=holidays)
df['date_prev'] = df['date'] - 1 * US_BUSINESS_DAY
代码完成了工作。但是对于大型数据集,这个过程非常缓慢。是否有可能以某种方式提高代码速度?
P.S。当我 运行 代码 python 给我这个警告时:
PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex
warnings.warn(
其实我找到了一个方法,它利用了np.busday_offset
def other(df):
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays().holidays
# check out the docs for how to adjust roll to your preference
result = np.busday_offset(df["date"].values.astype('datetime64[D]'),
[-1], roll= "forward", weekmask= "1111100", holidays= holidays)
return result
在创建一个包含 10,000 行随机日期的 df 并将其传递给您的 other
函数之后。速度从
Stat(s) for 10 execution(s) of yours:
mean: 893.01282 ms
median: 883.1989 ms
stdv: 83.3837 ms
max: 1134.7835 ms
min: 760.0869 ms
至:
Stat(s) for 10 execution(s) of other:
mean: 278.60783 ms
median: 274.44165 ms
stdv: 27.9785 ms
max: 330.3079 ms
min: 235.4329 ms
并使用您的代码:
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
df = pd.read_fwf(io.StringIO(data1_txt))
df['date'] = pd.to_datetime(df['date'])
def yours(df):
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays()
holidays = list(holidays.holidays)
US_BUSINESS_DAY = CustomBusinessDay(holidays=holidays)
result = df['date'] - 1 * US_BUSINESS_DAY
return result
display(yours(df), other(df))
>>>
0 2021-07-07
1 2021-07-02
2 2021-06-28
Name: date, dtype: datetime64[ns]
array(['2021-07-07', '2021-07-02', '2021-06-28'], dtype='datetime64[D]')
我有这个df
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
我需要获取前一个交易日。我设法用 pandas_market_calendars
python 包来做到这一点。这是我的全部代码
import io
import pandas_market_calendars as mcal
from pandas.tseries.offsets import CustomBusinessDay
import datetime
import pandas as pd
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
df = pd.read_fwf(io.StringIO(data1_txt))
df['date'] = pd.to_datetime(df['date'])
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays()
holidays = list(holidays.holidays)
US_BUSINESS_DAY = CustomBusinessDay(holidays=holidays)
df['date_prev'] = df['date'] - 1 * US_BUSINESS_DAY
代码完成了工作。但是对于大型数据集,这个过程非常缓慢。是否有可能以某种方式提高代码速度?
P.S。当我 运行 代码 python 给我这个警告时:
PerformanceWarning: Non-vectorized DateOffset being applied to Series or DatetimeIndex
warnings.warn(
其实我找到了一个方法,它利用了np.busday_offset
def other(df):
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays().holidays
# check out the docs for how to adjust roll to your preference
result = np.busday_offset(df["date"].values.astype('datetime64[D]'),
[-1], roll= "forward", weekmask= "1111100", holidays= holidays)
return result
在创建一个包含 10,000 行随机日期的 df 并将其传递给您的 other
函数之后。速度从
Stat(s) for 10 execution(s) of yours:
mean: 893.01282 ms
median: 883.1989 ms
stdv: 83.3837 ms
max: 1134.7835 ms
min: 760.0869 ms
至:
Stat(s) for 10 execution(s) of other:
mean: 278.60783 ms
median: 274.44165 ms
stdv: 27.9785 ms
max: 330.3079 ms
min: 235.4329 ms
并使用您的代码:
data1_txt = """
date
7/8/2021
7/6/2021
6/29/2021
"""
df = pd.read_fwf(io.StringIO(data1_txt))
df['date'] = pd.to_datetime(df['date'])
def yours(df):
nyse = mcal.get_calendar('NYSE')
holidays = nyse.holidays()
holidays = list(holidays.holidays)
US_BUSINESS_DAY = CustomBusinessDay(holidays=holidays)
result = df['date'] - 1 * US_BUSINESS_DAY
return result
display(yours(df), other(df))
>>>
0 2021-07-07
1 2021-07-02
2 2021-06-28
Name: date, dtype: datetime64[ns]
array(['2021-07-07', '2021-07-02', '2021-06-28'], dtype='datetime64[D]')