对 R 中的百分比进行预测
Making forecasting predictions of percentages in R
我的任务是找到一个涉及预测百分比的问题的解决方案。
我的主要问题是缺乏数据。我只有 2.5 年的数据(每周),需要预测今年剩余时间的百分比。
我看到的数据类似于以下内容:
week year date percentage
1 1 2019 2019-03-31 0.1068
2 2 2019 2019-04-07 0.0954
3 3 2019 2019-04-14 0.0845
4 4 2019 2019-04-21 0.0713
5 5 2019 2019-04-28 0.0762
6 6 2019 2019-05-05 0.0671
由于其性质,百分比确实表达了一些季节性,但对于某些 EDA,它不足以 class 作为一个完全季节性的数据集。
我最初尝试使用 lstm / keras 顺序模型,但事实证明这并不成功。
我不熟悉可以处理此类数据的任何方法,因此如果有人对如何最好地完成此任务有任何想法,我们会很高兴。
您可以从 fable
包及其环境开始。请注意,这是一个示例,请记住,您提供的示例数据可能不会对结果产生影响。
library(fable)
library(tsibble)
# convert as date
df$date <- as.Date(df$date, "%Y-%m-%d")
# as tsibble, a type of data.frame very useful for tsibble environment, it
# helps a lot also if you have many ts to forecast
df <- tsibble(df, index = date)
# divide data in train and test: this is going to help you which model is
# good to forecast, forecasting something you already know.
train <- df[df$date < as.Date('2019-04-21',"%Y-%m-%d"),]
test <- df[df$date >= as.Date('2019-04-21',"%Y-%m-%d"),]
# here you forecast, replace |> with %>% in case your R does not support it
# (maybe you'll library(magrittr) in case)
training <- train |>
# define models, you can put many
model(arima = ARIMA(percentage),
croston = CROSTON(percentage))
training
# A mable: 1 x 2
arima croston
<model> <model>
1 <ARIMA(0,2,0)> <croston>
forecasting <- training |>
# forecast ahead of 3
forecast(h = 3)
# here you see your result (does not put because with those data it's quite
# useless
autoplot(forecasting) + autolayer(train)
# and some accuracy metrics
accuracy(forecasting, test)
# A tibble: 2 x 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima Test 0.00883 0.0119 0.0104 12.4 14.6 NaN NaN -0.116
2 croston Test -0.0184 0.0188 0.0184 -26.1 26.1 NaN NaN -0.525
显然,您将为每个 ts 选择最佳模型(在本例中为一个 ts),并转发它以预测您的需求。
在这个简单的例子中,ARIMA(0,2,0) 似乎是最好的,所以你可以这样做,但你可以在指南中找到更好的转发方法:
df |> model(arima_0_2_0 = ARIMA(percentage ~ 0 + pdq(0,2,0))) |> forecast(h = 10)
一些模型允许您放置回归变量,因此您可以尝试对“怪异”时期建模,例如 covid 锁定、假期等,如果需要的话。
有数据:
df <- read.table(text = '
week year date percentage
1 1 2019 2019-03-31 0.1068
2 2 2019 2019-04-07 0.0954
3 3 2019 2019-04-14 0.0845
4 4 2019 2019-04-21 0.0713
5 5 2019 2019-04-28 0.0762
6 6 2019 2019-05-05 0.0671', header = T)
我的任务是找到一个涉及预测百分比的问题的解决方案。 我的主要问题是缺乏数据。我只有 2.5 年的数据(每周),需要预测今年剩余时间的百分比。
我看到的数据类似于以下内容:
week year date percentage
1 1 2019 2019-03-31 0.1068
2 2 2019 2019-04-07 0.0954
3 3 2019 2019-04-14 0.0845
4 4 2019 2019-04-21 0.0713
5 5 2019 2019-04-28 0.0762
6 6 2019 2019-05-05 0.0671
由于其性质,百分比确实表达了一些季节性,但对于某些 EDA,它不足以 class 作为一个完全季节性的数据集。
我最初尝试使用 lstm / keras 顺序模型,但事实证明这并不成功。
我不熟悉可以处理此类数据的任何方法,因此如果有人对如何最好地完成此任务有任何想法,我们会很高兴。
您可以从 fable
包及其环境开始。请注意,这是一个示例,请记住,您提供的示例数据可能不会对结果产生影响。
library(fable)
library(tsibble)
# convert as date
df$date <- as.Date(df$date, "%Y-%m-%d")
# as tsibble, a type of data.frame very useful for tsibble environment, it
# helps a lot also if you have many ts to forecast
df <- tsibble(df, index = date)
# divide data in train and test: this is going to help you which model is
# good to forecast, forecasting something you already know.
train <- df[df$date < as.Date('2019-04-21',"%Y-%m-%d"),]
test <- df[df$date >= as.Date('2019-04-21',"%Y-%m-%d"),]
# here you forecast, replace |> with %>% in case your R does not support it
# (maybe you'll library(magrittr) in case)
training <- train |>
# define models, you can put many
model(arima = ARIMA(percentage),
croston = CROSTON(percentage))
training
# A mable: 1 x 2
arima croston
<model> <model>
1 <ARIMA(0,2,0)> <croston>
forecasting <- training |>
# forecast ahead of 3
forecast(h = 3)
# here you see your result (does not put because with those data it's quite
# useless
autoplot(forecasting) + autolayer(train)
# and some accuracy metrics
accuracy(forecasting, test)
# A tibble: 2 x 10
.model .type ME RMSE MAE MPE MAPE MASE RMSSE ACF1
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 arima Test 0.00883 0.0119 0.0104 12.4 14.6 NaN NaN -0.116
2 croston Test -0.0184 0.0188 0.0184 -26.1 26.1 NaN NaN -0.525
显然,您将为每个 ts 选择最佳模型(在本例中为一个 ts),并转发它以预测您的需求。
在这个简单的例子中,ARIMA(0,2,0) 似乎是最好的,所以你可以这样做,但你可以在指南中找到更好的转发方法:
df |> model(arima_0_2_0 = ARIMA(percentage ~ 0 + pdq(0,2,0))) |> forecast(h = 10)
一些模型允许您放置回归变量,因此您可以尝试对“怪异”时期建模,例如 covid 锁定、假期等,如果需要的话。
有数据:
df <- read.table(text = '
week year date percentage
1 1 2019 2019-03-31 0.1068
2 2 2019 2019-04-07 0.0954
3 3 2019 2019-04-14 0.0845
4 4 2019 2019-04-21 0.0713
5 5 2019 2019-04-28 0.0762
6 6 2019 2019-05-05 0.0671', header = T)