从贝叶斯结构时间序列模型恢复拟合值的两种方法产生不同的结果

Two methods of recovering fitted values from a Bayesian Structural Time Series model yield different results

从 bsts 模型中检索给定 y[t-1] 的 y[t] 的样本内预测(或“条件期望”)的两种概念上合理的方法产生不同的结果,我不明白为什么。

一种方法使用 bsts 返回的预测误差(定义为 e=y[t] - E(y[t]|y[t-1]);来源:https://rdrr.io/cran/bsts/man/one.step.prediction.errors.html ):

library(bsts)
get_yhats1 <- function(fit){
  # One step prediction errors defined as e=y[t] - yhat (source: )
  # Recover yhat by y-e
  
  bsts.pred.errors <- bsts.prediction.errors(fit, burn=SuggestBurn(0.1, fit))$in.sample
  predictions <- t(apply(bsts.pred.errors, 1, function(e){fit$original.series-e}))
  return(predictions)
}

另一个总结所有模型组件在时间 t 的贡献。

get_yhats2 <- function(fit){
  burn <- SuggestBurn(0.1, fit)
  X     <- fit$state.contributions
  niter <- dim(X)[1]
  ncomp <- dim(X)[2]
  nobs  <- dim(X)[3]
  
  # initialize final fit/residuals matrices with zeros
  predictions <- matrix(data = 0, nrow = niter - burn, ncol = nobs)
  p0 <- predictions
  
  comps <- seq_len(ncomp)
  for (comp in comps) {
    # pull out the state contributions for this component and transpose to
    # a niter x (nobs - burn) array
    compX <- X[-seq_len(burn), comp, ]
    
    # accumulate the predictions across each component
    predictions <- predictions + compX
  }
  
  return(predictions)
}

拟合模型:

## Air passengers data
data("AirPassengers")
# 11 years, monthly data (timestep=monthly) --> 132 observations
Y <- stats::window(AirPassengers, start=c(1949,1), end=c(1959,12))
y <- log(Y)

ss <- AddLocalLinearTrend(list(), y)
ss <- AddSeasonal(ss, y, nseasons=12, season.duration=1)
bsts.model <- bsts(y, state.specification=ss, niter=500, family='gaussian')

使用每个函数计算和比较预测

p1 <- get_yhats1(bsts.model)
p2 <- get_yhats2(bsts.model)

# Compare predictions for t=1:5, first MCMC iteration:
p1[1,1:5]; p2[1,1:5]

我是bsts的作者

bsts中的'prediction errors'来自过滤分布。也就是说,它们来自 p(state | past data)。状态贡献来自平滑分布,即 p(state | all data)。过滤分布向后看,而平滑分布既向前又向后看。人们通常在使用拟合模型时需要过滤分布,而在首先拟合模型时需要平滑分布。