对 R 中的百分比进行预测

Making forecasting predictions of percentages in R

我的任务是找到一个涉及预测百分比的问题的解决方案。 我的主要问题是缺乏数据。我只有 2.5 年的数据(每周),需要预测今年剩余时间的百分比。

我看到的数据类似于以下内容:

    week  year    date          percentage
1   1     2019    2019-03-31    0.1068
2   2     2019    2019-04-07    0.0954
3   3     2019    2019-04-14    0.0845
4   4     2019    2019-04-21    0.0713
5   5     2019    2019-04-28    0.0762
6   6     2019    2019-05-05    0.0671

由于其性质,百分比确实表达了一些季节性,但对于某些 EDA,它不足以 class 作为一个完全季节性的数据集。

我最初尝试使用 lstm / keras 顺序模型,但事实证明这并不成功。

我不熟悉可以处理此类数据的任何方法,因此如果有人对如何最好地完成此任务有任何想法,我们会很高兴。

您可以从 fable 包及其环境开始。请注意,这是一个示例,请记住,您提供的示例数据可能不会对结果产生影响。

library(fable)
library(tsibble)  

# convert as date
df$date <- as.Date(df$date, "%Y-%m-%d")

# as tsibble, a type of data.frame very useful for tsibble environment, it
# helps a lot also if you have many ts to forecast
df <- tsibble(df, index = date)

# divide data in train and test: this is going to help you which model is 
# good to forecast, forecasting something you already know.
train <- df[df$date <  as.Date('2019-04-21',"%Y-%m-%d"),]
test  <- df[df$date >= as.Date('2019-04-21',"%Y-%m-%d"),]

# here you forecast, replace |> with %>% in case your R does not support it
# (maybe you'll library(magrittr) in case)
training <- train |> 
              # define models, you can put many
              model(arima   = ARIMA(percentage),
                    croston = CROSTON(percentage))
training
    # A mable: 1 x 2
           arima   croston
         <model>   <model>
1 <ARIMA(0,2,0)> <croston>


forecasting <- training |> 
              # forecast ahead of 3
              forecast(h = 3)

# here you see your result (does not put because with those data it's quite
# useless
autoplot(forecasting) + autolayer(train)

# and some accuracy metrics
accuracy(forecasting, test) 
# A tibble: 2 x 10
  .model  .type       ME   RMSE    MAE   MPE  MAPE  MASE RMSSE   ACF1
  <chr>   <chr>    <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl> <dbl>  <dbl>
1 arima   Test   0.00883 0.0119 0.0104  12.4  14.6   NaN   NaN -0.116
2 croston Test  -0.0184  0.0188 0.0184 -26.1  26.1   NaN   NaN -0.525

显然,您将为每个 ts 选择最佳模型(在本例中为一个 ts),并转发它以预测您的需求。

在这个简单的例子中,ARIMA(0,2,0) 似乎是最好的,所以你可以这样做,但你可以在指南中找到更好的转发方法:

df |> model(arima_0_2_0 = ARIMA(percentage ~ 0  + pdq(0,2,0))) |> forecast(h = 10)

一些模型允许您放置回归变量,因此您可以尝试对“怪异”时期建模,例如 covid 锁定、假期等,如果需要的话。


有数据:

df <- read.table(text = '
week  year    date          percentage
1   1     2019    2019-03-31    0.1068
2   2     2019    2019-04-07    0.0954
3   3     2019    2019-04-14    0.0845
4   4     2019    2019-04-21    0.0713
5   5     2019    2019-04-28    0.0762
6   6     2019    2019-05-05    0.0671', header = T)