R auto.arima 预报
R auto.arima forecast
我想为某事创建预测,我选择 auto.arima。训练后,我无法计算预测还有2篇文章:
my_forecast <- ts(frc$sales_30, frequency = 12)
my_forecast <- tsclean(my_forecast)
fit <- auto.arima(my_forecast)
但我有 100 篇文章 + 我需要预测所有这些名称(格式:年、月、销售额、文章)
此任务在 R 中的典型工作流程是列表式的。这意味着您通过 list-items
中的文章传播数据并在这些文章上应用函数。正如您可能已经理解的那样,年份和月份是无关紧要的,因为 time-series
是由 ts()
函数的频率变量生成的。
因此,此示例仅适用于文章 A 和 B 以及它们虚构的月销售额向量,我们假设它已经按日期排序。
我不会深入研究 time-series
analysis/predictions 的技术细节,而是主要关注 process/code 以基于包含所有文章(或任何级别)的 df 进行多个预测分组)和相应的销售历史记录。我没有使用 tsclean()
函数,但从工作流中应该可以看出如何包含它:
library(forecast)
library(tidyverse)
# set up some dummy data (has no clear pattern in terms of seasonality etc. but works for demo)
## bear in mind that this is randomly generated data therefore you most likely will not reproduce my data but with the help of a seed you can work arround this as well.
df <- data.frame(article = c(rep("A", 24), rep("B", 24)),
sales = c(sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE),
sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE)))
# build grouping inside de df/tibble
dfg <- df %>%
dplyr::group_by(article)
# split the new df by grouping criteria into list
dfl <- dfg %>%
dplyr::group_split(.keep = FALSE)
# set list names acording to article value (no needed but might be helpfull for you)
names(dfl) <- dplyr::group_keys(dfg)$article
# apply ts function with frequency 12 to the list items
dflt <- lapply(dfl, ts, frequency = 12)
# apply the auto.arima to build list of models
dfltm <- lapply(dflt, forecast::auto.arima)
# apply forecast with horizon 2 on the list of final models from auto.arima
predictions <- lapply(dfltm, forecast::forecast, h = 2)
# print results
predictions
$A
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3 34.79167 22.47636 47.10697 15.95703 53.6263
$B
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3 34.58333 20.32802 48.83865 12.78171 56.38496
做同样事情的现代方法是在 tibble
:
中使用嵌套列表
# build list inside the tibble/df by existing groupings
npd <- tidyr::nest(dfg) %>%
# generate new column of ts series data
dplyr::mutate(tsdata = purrr::map(data, ~ ts(.x, frequency = 12)),
# use auto.arima on the data to build new column of final auto.arima models
models = purrr::map(tsdata, ~ forecast::auto.arima(.x)),
# generate forecast as new column
predictions = purrr::map(models, ~ forecast::forecast(.x, h = 2)))
# print prediction results
npd$predictions
[[1]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3 34.79167 22.47636 47.10697 15.95703 53.6263
[[2]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3 34.58333 20.32802 48.83865 12.78171 56.38496
正如最初提到的,ts()
函数基于频率而不是日期列工作,这意味着您必须确保列出没有销售的月份,并且所有文章都有完整的数据时间线,顺序越来越长(时间导向)。在形成 time-series
对象之前必须包含缺失值。
最后强烈推荐forecast
包作者的开书,可以在这里找到:https://otexts.com/fpp2/
我想为某事创建预测,我选择 auto.arima。训练后,我无法计算预测还有2篇文章:
my_forecast <- ts(frc$sales_30, frequency = 12)
my_forecast <- tsclean(my_forecast)
fit <- auto.arima(my_forecast)
但我有 100 篇文章 + 我需要预测所有这些名称(格式:年、月、销售额、文章)
此任务在 R 中的典型工作流程是列表式的。这意味着您通过 list-items
中的文章传播数据并在这些文章上应用函数。正如您可能已经理解的那样,年份和月份是无关紧要的,因为 time-series
是由 ts()
函数的频率变量生成的。
因此,此示例仅适用于文章 A 和 B 以及它们虚构的月销售额向量,我们假设它已经按日期排序。
我不会深入研究 time-series
analysis/predictions 的技术细节,而是主要关注 process/code 以基于包含所有文章(或任何级别)的 df 进行多个预测分组)和相应的销售历史记录。我没有使用 tsclean()
函数,但从工作流中应该可以看出如何包含它:
library(forecast)
library(tidyverse)
# set up some dummy data (has no clear pattern in terms of seasonality etc. but works for demo)
## bear in mind that this is randomly generated data therefore you most likely will not reproduce my data but with the help of a seed you can work arround this as well.
df <- data.frame(article = c(rep("A", 24), rep("B", 24)),
sales = c(sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE),
sample(seq(from = 20, to = 50, by = 5), size = 24, replace = TRUE)))
# build grouping inside de df/tibble
dfg <- df %>%
dplyr::group_by(article)
# split the new df by grouping criteria into list
dfl <- dfg %>%
dplyr::group_split(.keep = FALSE)
# set list names acording to article value (no needed but might be helpfull for you)
names(dfl) <- dplyr::group_keys(dfg)$article
# apply ts function with frequency 12 to the list items
dflt <- lapply(dfl, ts, frequency = 12)
# apply the auto.arima to build list of models
dfltm <- lapply(dflt, forecast::auto.arima)
# apply forecast with horizon 2 on the list of final models from auto.arima
predictions <- lapply(dfltm, forecast::forecast, h = 2)
# print results
predictions
$A
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3 34.79167 22.47636 47.10697 15.95703 53.6263
$B
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3 34.58333 20.32802 48.83865 12.78171 56.38496
做同样事情的现代方法是在 tibble
:
# build list inside the tibble/df by existing groupings
npd <- tidyr::nest(dfg) %>%
# generate new column of ts series data
dplyr::mutate(tsdata = purrr::map(data, ~ ts(.x, frequency = 12)),
# use auto.arima on the data to build new column of final auto.arima models
models = purrr::map(tsdata, ~ forecast::auto.arima(.x)),
# generate forecast as new column
predictions = purrr::map(models, ~ forecast::forecast(.x, h = 2)))
# print prediction results
npd$predictions
[[1]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.79167 22.47636 47.10697 15.95703 53.6263
Feb 3 34.79167 22.47636 47.10697 15.95703 53.6263
[[2]]
Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
Jan 3 34.58333 20.32802 48.83865 12.78171 56.38496
Feb 3 34.58333 20.32802 48.83865 12.78171 56.38496
正如最初提到的,ts()
函数基于频率而不是日期列工作,这意味着您必须确保列出没有销售的月份,并且所有文章都有完整的数据时间线,顺序越来越长(时间导向)。在形成 time-series
对象之前必须包含缺失值。
最后强烈推荐forecast
包作者的开书,可以在这里找到:https://otexts.com/fpp2/