使用 ARIMA 和 tsibble 预测不规则股票数据

Forecasting irregular stock data with ARIMA and tsibble

我想使用 ARIMA 预测特定股票,其方式与 R. Hyndman 在 FPP3 中所做的类似。

我 运行 遇到的第一个问题是股票数据明显不规律,因为证券交易所在周末和一些假期关闭。如果我想使用 tidyverts 包中的函数,这会产生一些问题:

> stock
# A tsibble: 750 x 6 [1D]
   Date        Open  High   Low Close Volume
   <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2019-05-21  36.3  36.4  36.3  36.4    232
 2 2019-05-22  36.4  37.0  36.4  36.8   1007
 3 2019-05-23  36.7  36.8  36.1  36.1   4298
 4 2019-05-24  36.4  36.5  36.4  36.4    452
 5 2019-05-27  36.5  36.5  36.3  36.4   2032
 6 2019-05-28  36.5  36.8  36.4  36.5   3049
 7 2019-05-29  36.2  36.5  36.1  36.5   2962
 8 2019-05-30  36.8  37.1  36.8  37.1    432
 9 2019-05-31  36.8  37.4  36.8  37.4   8424
10 2019-06-03  37.3  37.5  37.2  37.3   1550
# ... with 740 more rows


> stock %>%
+ feasts::ACF(difference(Close)) %>%
+ autoplot()

Error in `check_gaps()`:
! .data contains implicit gaps in time. You should check your data and convert implicit gaps into explicit missing values using `tsibble::fill_gaps()` if required.

关于时间间隔的相同错误适用于其他函数,例如 fable::ARIMA() 或 feasts::gg_tsdisplay()。

我尝试用前几行的值填补空白:

stock %>%
  group_by_key() %>%
  fill_gaps() %>%
  tidyr::fill(Close, .direction = "down")

# A tsibble: 1,096 x 6 [1D]
   Date        Open  High   Low Close Volume
   <date>     <dbl> <dbl> <dbl> <dbl>  <dbl>
 1 2019-05-21  36.3  36.4  36.3  36.4    232
 2 2019-05-22  36.4  37.0  36.4  36.8   1007
 3 2019-05-23  36.7  36.8  36.1  36.1   4298
 4 2019-05-24  36.4  36.5  36.4  36.4    452
 5 2019-05-25  NA    NA    NA    36.4     NA
 6 2019-05-26  NA    NA    NA    36.4     NA
 7 2019-05-27  36.5  36.5  36.3  36.4   2032
 8 2019-05-28  36.5  36.8  36.4  36.5   3049
 9 2019-05-29  36.2  36.5  36.1  36.5   2962
10 2019-05-30  36.8  37.1  36.8  37.1    432
# ... with 1,086 more rows

一切正常。我的问题是:

首先,您显然使用的是旧版本的 feasts 包,因为在从具有隐式间隙的数据计算 ACF 时,当前版本给出的是警告而不是错误。

其次,答案取决于你想做什么样的分析。您有三个选择:

  1. 以天为时间索引,用NA补空;
  2. 以天为时间指标,以前收盘价补空;
  3. 以交易日为时间指标,则无跳空。

以下是他们每个人的结果,以 2014-2018 年期间的 Apple 股票为例。

library(fpp3)
#> ── Attaching packages ─────────────────────────────────────── fpp3 0.4.0.9000 ──
#> ✔ tibble      3.1.7     ✔ tsibble     1.1.1
#> ✔ dplyr       1.0.9     ✔ tsibbledata 0.4.0
#> ✔ tidyr       1.2.0     ✔ feasts      0.2.2
#> ✔ lubridate   1.8.0     ✔ fable       0.3.1
#> ✔ ggplot2     3.3.6     ✔ fabletools  0.3.2
#> ── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
#> ✖ lubridate::date()    masks base::date()
#> ✖ dplyr::filter()      masks stats::filter()
#> ✖ tsibble::intersect() masks base::intersect()
#> ✖ tsibble::interval()  masks lubridate::interval()
#> ✖ dplyr::lag()         masks stats::lag()
#> ✖ tsibble::setdiff()   masks base::setdiff()
#> ✖ tsibble::union()     masks base::union()

1。用缺失值填充 non-trading 天

stock <- gafa_stock %>%
  filter(Symbol == "AAPL") %>%
  tsibble(index = Date, regular = TRUE) %>%
  fill_gaps()
stock
#> # A tsibble: 1,825 x 8 [1D]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
#>  3 <NA>   2014-01-04  NA    NA    NA    NA        NA          NA
#>  4 <NA>   2014-01-05  NA    NA    NA    NA        NA          NA
#>  5 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
#>  6 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
#>  7 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
#>  8 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
#>  9 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
#> 10 <NA>   2014-01-11  NA    NA    NA    NA        NA          NA
#> # … with 1,815 more rows

stock %>%
  model(ARIMA(Close ~ pdq(d=1)))
#> A mable: 1 x 1
#>  `ARIMA(Close ~ pdq(d = 1))`
#>                      <model>
#> 1              <ARIMA(0,1,0)>

在这种情况下,ACF的计算会找到最长的连续部分,太小而没有意义,因此没有任何意义显示ACF()gg_tsdisplay()的结果。此外,由于缺少值,ARIMA 模型中差分的自动选择失败,因此我手动将其设置为 1。 ARIMA 模型的其他部分在存在缺失值的情况下工作正常。

2。用最后观察到的值填充 non-trading 天

stock <- stock %>%
  tidyr::fill(Close, .direction = "down")
stock
#> # A tsibble: 1,825 x 8 [1D]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900
#>  3 <NA>   2014-01-04  NA    NA    NA    77.3      NA          NA
#>  4 <NA>   2014-01-05  NA    NA    NA    77.3      NA          NA
#>  5 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700
#>  6 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300
#>  7 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400
#>  8 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200
#>  9 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000
#> 10 <NA>   2014-01-11  NA    NA    NA    76.1      NA          NA
#> # … with 1,815 more rows

stock %>%
  ACF(difference(Close)) %>%
  autoplot()

stock %>%
  model(ARIMA(Close))
#> # A mable: 1 x 1
#>   `ARIMA(Close)`
#>          <model>
#> 1 <ARIMA(0,1,0)>

stock %>%
  gg_tsdisplay(Close)

3。 Re-index按交易日

stock <- gafa_stock %>%
  filter(Symbol == "AAPL") %>%
  tsibble(index = Date, regular = TRUE) %>%
  mutate(trading_day = row_number()) %>%
  tsibble(index = trading_day)
stock
#> # A tsibble: 1,258 x 9 [1]
#>    Symbol Date        Open  High   Low Close Adj_Close    Volume trading_day
#>    <chr>  <date>     <dbl> <dbl> <dbl> <dbl>     <dbl>     <dbl>       <int>
#>  1 AAPL   2014-01-02  79.4  79.6  78.9  79.0      67.0  58671200           1
#>  2 AAPL   2014-01-03  79.0  79.1  77.2  77.3      65.5  98116900           2
#>  3 AAPL   2014-01-06  76.8  78.1  76.2  77.7      65.9 103152700           3
#>  4 AAPL   2014-01-07  77.8  78.0  76.8  77.1      65.4  79302300           4
#>  5 AAPL   2014-01-08  77.0  77.9  77.0  77.6      65.8  64632400           5
#>  6 AAPL   2014-01-09  78.1  78.1  76.5  76.6      65.0  69787200           6
#>  7 AAPL   2014-01-10  77.1  77.3  75.9  76.1      64.5  76244000           7
#>  8 AAPL   2014-01-13  75.7  77.5  75.7  76.5      64.9  94623200           8
#>  9 AAPL   2014-01-14  76.9  78.1  76.8  78.1      66.1  83140400           9
#> 10 AAPL   2014-01-15  79.1  80.0  78.8  79.6      67.5  97909700          10
#> # … with 1,248 more rows

stock %>%
  ACF(difference(Close)) %>%
  autoplot()

stock %>%
  model(ARIMA(Close))
#> # A mable: 1 x 1
#>   `ARIMA(Close)`
#>          <model>
#> 1 <ARIMA(2,1,3)>

stock %>%
  gg_tsdisplay(Close)

reprex package (v2.0.1)

于 2022-05-22 创建