Facebook NeuralProphet - 添加假期
Fecebook NeauralProphet - adding holidays
我有一个通用数据集用于我的预测,其中包括全球数据。
ds y country_id
01/01/2021 09:00:00 5.0 1
01/01/2021 09:10:00 5.2 1
01/01/2021 09:20:00 5.4 1
01/01/2021 09:30:00 6.1 1
01/01/2021 09:00:00 2.0 2
01/01/2021 09:10:00 2.2 2
01/01/2021 09:20:00 2.4 2
01/01/2021 09:30:00 3.1 2
playoffs = pd.DataFrame({
'holiday': 'playoff',
'ds': pd.to_datetime(['2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
superbowls = pd.DataFrame({
'holiday': 'superbowl',
'ds': pd.to_datetime(['2010-02-07', '2014-02-02', '2016-02-07']),
'lower_window': 0,
'upper_window': 1,
})
holidays = pd.concat((playoffs, superbowls))
现在,我想为模型添加假期。
m = NeuralProphet(holidays=holidays)
m.add_country_holidays(country_name='US')
m.fit(df)
- 如何将多个国家/地区的假期添加到 add_country_holidays (m.add_country_holidays)?
- 如何将特定国家/地区的假期添加到假期数据?
- 我是否需要生成特定于国家/地区的不同模型?或者,整个数据集的一个模型很好,然后就可以添加回归量。有什么建议?
这是一个可能的解决方案:
程序:
# NOTE 1: tested on google colab
# Un-comment the following (!pip) line if you need to install the libraries
# on google colab notebook:
#!pip install neuralprophet pandas numpy holidays
import pandas as pd
import numpy as np
import holidays
from neuralprophet import NeuralProphet
import datetime
# NOTE 2: Most of the code comes from:
# https://neuralprophet.com/html/events_holidays_peyton_manning.html
# Context:
# We will use the time series of the log daily page views for the Wikipedia
# page for Peyton Manning (American former football quarterback ) as an example.
# During playoffs and super bowls, the Peyton Manning's wiki page is more frequently
# viewed. We would like to see if countries specific holidays also have an
# influence.
# First, we load the data:
data_location = "https://raw.githubusercontent.com/ourownstory/neuralprophet-data/main/datasets/"
df = pd.read_csv(data_location + "wp_log_peyton_manning.csv")
# To simulate your case, we add a country_id column filled with random values {1,2}
# Let's assume US=1 and Canada=2
import numpy as np
np.random.seed(0)
df['country_id']=np.random.randint(1,2+1,df['ds'].count())
print("The dataframe we are working on:")
print(df.head())
# We would like to add holidays for US and Canada to see if holidays have an
# influence on the # of daily's views on Manning's wiki page.
# The data in df starts in 2007 and ends in 2016:
StartingYear=2007
LastYear=2016
# Holidays for US and Canada:
US_holidays = holidays.US(years=[year for year in range(StartingYear, LastYear+1)])
CA_holidays = holidays.CA(years=[year for year in range(StartingYear, LastYear+1)])
holidays_US=pd.DataFrame()
holidays_US['ds']=[]
holidays_US['event']=[]
holidays_CA=pd.DataFrame()
holidays_CA['ds']=[]
holidays_CA['event']=[]
for i in df.index:
# Convert date string to datetime object:
datetimeobj=[int(x) for x in df['ds'][i].split('-')]
# Check if the corresponding day is a holyday in the US;
if df['country_id'][i]==1 and (datetime.datetime(*datetimeobj) in US_holidays):
d = {'ds': [df['ds'][i]], 'event': ['holiday_US']}
df1=pd.DataFrame(data=d)
# If yes: add to holidays_US
holidays_US=holidays_US.append(df1,ignore_index=True)
# Check if the corresponding day is a holyday in Canada:
if df['country_id'][i]==2 and (datetime.datetime(*datetimeobj) in CA_holidays):
d = {'ds': [df['ds'][i]], 'event': ['holiday_CA']}
df1=pd.DataFrame(data=d)
# If yes: add to holidays_CA
holidays_CA=holidays_CA.append(df1,ignore_index=True)
# Now we can drop the country_id in df:
df.drop('country_id', axis=1, inplace=True)
print("Days in df that are holidays in the US:")
print(holidays_US.head())
print()
print("Days in df that are holidays in Canada:")
print(holidays_CA.head())
# user specified events
# history events
playoffs = pd.DataFrame({
'event': 'playoff',
'ds': pd.to_datetime([
'2008-01-13', '2009-01-03', '2010-01-16',
'2010-01-24', '2010-02-07', '2011-01-08',
'2013-01-12', '2014-01-12', '2014-01-19',
'2014-02-02', '2015-01-11', '2016-01-17',
'2016-01-24', '2016-02-07',
]),
})
superbowls = pd.DataFrame({
'event': 'superbowl',
'ds': pd.to_datetime([
'2010-02-07', '2012-02-05', '2014-02-02',
'2016-02-07',
]),
})
# Create the events_df:
events_df = pd.concat((playoffs, superbowls, holidays_US, holidays_CA))
# Create neural network and fit:
# NeuralProphet Object
m = NeuralProphet(loss_func="MSE")
m = m.add_events("playoff")
m = m.add_events("superbowl")
m = m.add_events("holiday_US")
m = m.add_events("holiday_CA")
# create the data df with events
history_df = m.create_df_with_events(df, events_df)
# fit the model
metrics = m.fit(history_df, freq="D")
# forecast with events known ahead
future = m.make_future_dataframe(df=history_df, events_df=events_df, periods=365, n_historic_predictions=len(df))
forecast = m.predict(df=future)
fig = m.plot(forecast)
fig_param = m.plot_parameters()
fig_comp = m.plot_components(forecast)
结果: 结果(参见 PARAMETERS 图)似乎表明,当一天是假期时,美国和加拿大的观看次数都较少。是否有意义?也许...看起来很合理,人们在度假时有更多有趣的事情要做,而不是浏览 Manning 的维基页面 :-) 我不知道。
程序的输出:
The dataframe we are working on:
ds y country_id
0 2007-12-10 9.5908 1
1 2007-12-11 8.5196 2
2 2007-12-12 8.1837 2
3 2007-12-13 8.0725 1
4 2007-12-14 7.8936 2
Days in df that are holidays in the US:
ds event
0 2007-12-25 holiday_US
1 2008-01-21 holiday_US
2 2008-07-04 holiday_US
3 2008-11-27 holiday_US
4 2008-12-25 holiday_US
Days in df that are holidays in Canada:
ds event
0 2008-01-01 holiday_CA
1 2008-02-18 holiday_CA
2 2008-08-04 holiday_CA
3 2008-09-01 holiday_CA
4 2008-10-13 holiday_CA
INFO - (NP.utils.set_auto_seasonalities) - Disabling daily seasonality. Run NeuralProphet with daily_seasonality=True to override this.
INFO - (NP.config.set_auto_batch_epoch) - Auto-set batch_size to 32
INFO - (NP.config.set_auto_batch_epoch) - Auto-set epochs to 138
88%
241/273 [00:02<00:00, 121.69it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.51E+00
88%
241/273 [00:02<00:00, 123.87it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.36E-02, min: 1.63E+00
89%
242/273 [00:02<00:00, 121.58it/s]
INFO - (NP.utils_torch.lr_range_test) - lr-range-test results: steep: 3.62E-02, min: 2.58E+00
INFO - (NP.forecaster._init_train_loader) - lr-range-test selected learning rate: 3.44E-02
Epoch[138/138]: 100%|██████████| 138/138 [00:29<00:00, 4.74it/s, MSELoss=0.012, MAE=0.344, RMSE=0.478, RegLoss=0]
数字:
预测:
参数:
组件: