使用 R 进行分层预测
Hierarchical Forecasting using R
我正在使用寓言包来预测分层时间序列和所有节点的深度不相等。
用例是预测国家 -> 州 -> 地区级别的联系人。汇总时预测值必须加起来达到国家级别(较低级别的预测等于较高级别的预测)。
https://robjhyndman.com/papers/Foresight-hts-final.pdf
下面给出的是我在预测测试数据时尝试过的代码。
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
# selecting train data
train_df <- tourism %>%
filter(year(Quarter) <= 2014 & Region %in% c("MacDonnell", "Melbourne"))
# selecting test data
test_df <- tourism %>%
filter(year(Quarter) > 2014 & Region %in% c("MacDonnell", "Melbourne"))
# fitting ets model with reconcilliation
ets_fit <- train_df %>%
aggregate_key(Purpose * (State / Region), Trips = sum(Trips)) %>%
model(ets=ETS(Trips)) %>%
reconcile(ets_adjusted = min_trace(ets))
# forecasting on test data
fcasts_test <- forecast(ets_fit, test_df)
获取错误为
Error: Provided data contains a different key structure to the models.
Run `rlang::last_error()` to see where the error occurred.
我该如何解决这个问题?
您在拟合模型之前使用 aggregate_key()
更改了键结构,因此预测键结构与测试集不匹配。您需要使用 aggregate_key()
.
在 之后创建测试集
但是,您不能在创建聚合后按其中一个键进行过滤,因为那样聚合信息是不完整的。
这是一个可以满足您要求的示例。
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
# Aggregate data as required
agg_tourism <- tourism %>%
filter(Region %in% c("MacDonnell", "Melbourne")) %>%
aggregate_key(Purpose * (State / Region), Trips = sum(Trips))
# Select training data
train_df <- agg_tourism %>%
filter(year(Quarter) <= 2014)
# Select test data
test_df <- agg_tourism %>%
filter(year(Quarter) > 2014)
# Fit ets model with reconcilliation
ets_fit <- train_df %>%
model(ets = ETS(Trips)) %>%
reconcile(ets_adjusted = min_trace(ets))
# forecasting on test data
fcasts_test <- forecast(ets_fit, test_df)
fcasts_test
#> # A fable: 600 x 7 [1Q]
#> # Key: Purpose, State, Region, .model [50]
#> Purpose State Region .model Quarter Trips .mean
#> <chr*> <chr*> <chr*> <chr> <qtr> <dist> <dbl>
#> 1 Business Northern Territory MacDonnell ets 2015 Q1 N(5.1, 21) 5.12
#> 2 Business Northern Territory MacDonnell ets 2015 Q2 N(5.1, 21) 5.12
#> 3 Business Northern Territory MacDonnell ets 2015 Q3 N(5.1, 21) 5.12
#> 4 Business Northern Territory MacDonnell ets 2015 Q4 N(5.1, 21) 5.12
#> 5 Business Northern Territory MacDonnell ets 2016 Q1 N(5.1, 21) 5.12
#> 6 Business Northern Territory MacDonnell ets 2016 Q2 N(5.1, 21) 5.12
#> 7 Business Northern Territory MacDonnell ets 2016 Q3 N(5.1, 21) 5.12
#> 8 Business Northern Territory MacDonnell ets 2016 Q4 N(5.1, 21) 5.12
#> 9 Business Northern Territory MacDonnell ets 2017 Q1 N(5.1, 21) 5.12
#> 10 Business Northern Territory MacDonnell ets 2017 Q2 N(5.1, 21) 5.12
#> # … with 590 more rows
fcasts_test %>%
filter(Region == "Melbourne", Purpose == "Visiting") %>%
autoplot(agg_tourism)
由 reprex package (v0.3.0)
于 2020-12-26 创建
我正在使用寓言包来预测分层时间序列和所有节点的深度不相等。
用例是预测国家 -> 州 -> 地区级别的联系人。汇总时预测值必须加起来达到国家级别(较低级别的预测等于较高级别的预测)。
https://robjhyndman.com/papers/Foresight-hts-final.pdf
下面给出的是我在预测测试数据时尝试过的代码。
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
# selecting train data
train_df <- tourism %>%
filter(year(Quarter) <= 2014 & Region %in% c("MacDonnell", "Melbourne"))
# selecting test data
test_df <- tourism %>%
filter(year(Quarter) > 2014 & Region %in% c("MacDonnell", "Melbourne"))
# fitting ets model with reconcilliation
ets_fit <- train_df %>%
aggregate_key(Purpose * (State / Region), Trips = sum(Trips)) %>%
model(ets=ETS(Trips)) %>%
reconcile(ets_adjusted = min_trace(ets))
# forecasting on test data
fcasts_test <- forecast(ets_fit, test_df)
获取错误为
Error: Provided data contains a different key structure to the models.
Run `rlang::last_error()` to see where the error occurred.
我该如何解决这个问题?
您在拟合模型之前使用 aggregate_key()
更改了键结构,因此预测键结构与测试集不匹配。您需要使用 aggregate_key()
.
但是,您不能在创建聚合后按其中一个键进行过滤,因为那样聚合信息是不完整的。
这是一个可以满足您要求的示例。
library(fable)
library(tsibble)
library(tsibbledata)
library(lubridate)
library(dplyr)
# Aggregate data as required
agg_tourism <- tourism %>%
filter(Region %in% c("MacDonnell", "Melbourne")) %>%
aggregate_key(Purpose * (State / Region), Trips = sum(Trips))
# Select training data
train_df <- agg_tourism %>%
filter(year(Quarter) <= 2014)
# Select test data
test_df <- agg_tourism %>%
filter(year(Quarter) > 2014)
# Fit ets model with reconcilliation
ets_fit <- train_df %>%
model(ets = ETS(Trips)) %>%
reconcile(ets_adjusted = min_trace(ets))
# forecasting on test data
fcasts_test <- forecast(ets_fit, test_df)
fcasts_test
#> # A fable: 600 x 7 [1Q]
#> # Key: Purpose, State, Region, .model [50]
#> Purpose State Region .model Quarter Trips .mean
#> <chr*> <chr*> <chr*> <chr> <qtr> <dist> <dbl>
#> 1 Business Northern Territory MacDonnell ets 2015 Q1 N(5.1, 21) 5.12
#> 2 Business Northern Territory MacDonnell ets 2015 Q2 N(5.1, 21) 5.12
#> 3 Business Northern Territory MacDonnell ets 2015 Q3 N(5.1, 21) 5.12
#> 4 Business Northern Territory MacDonnell ets 2015 Q4 N(5.1, 21) 5.12
#> 5 Business Northern Territory MacDonnell ets 2016 Q1 N(5.1, 21) 5.12
#> 6 Business Northern Territory MacDonnell ets 2016 Q2 N(5.1, 21) 5.12
#> 7 Business Northern Territory MacDonnell ets 2016 Q3 N(5.1, 21) 5.12
#> 8 Business Northern Territory MacDonnell ets 2016 Q4 N(5.1, 21) 5.12
#> 9 Business Northern Territory MacDonnell ets 2017 Q1 N(5.1, 21) 5.12
#> 10 Business Northern Territory MacDonnell ets 2017 Q2 N(5.1, 21) 5.12
#> # … with 590 more rows
fcasts_test %>%
filter(Region == "Melbourne", Purpose == "Visiting") %>%
autoplot(agg_tourism)
由 reprex package (v0.3.0)
于 2020-12-26 创建