在 rstanarm 中为多个预测变量设置先验?
Set priors for multiple predictors in rstanarm?
我对如何为以下模型的多个预测变量设置先验感到有点困惑:
require(rstanarm)
wi_prior <- normal(0, sd(train$attendance))
SEED <- 101
fmla <- attendance ~ (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
baylm <- stan_glmer(fmla,
data = train,
family = "gaussian",
algorithm = "sampling",
adapt_delta = .95,
prior_intercept = wi_prior, seed = SEED)
根据请求,这是火车中的第一个观察结果。
train <- structure(list(franchID = structure(25L, .Label = c("ANA", "ARI",
"ATL", "BAL", "BOS", "CHC", "CHW", "CIN", "CLE", "COL", "DET",
"FLA", "HOU", "KCR", "LAD", "MIL", "MIN", "NYM", "NYY", "OAK",
"PHI", "PIT", "SDP", "SEA", "SFG", "STL", "TBD", "TEX", "TOR",
"WSN"), class = "factor"), yearID = 1999L, name = "San Francisco Giants",
park = "3Com Park", attendance = 2078399L, W = 86L, W1 = 89L,
W2 = 90L, W3 = 68L, WCWin1 = FALSE, WCWin2 = FALSE, WCWin3 = FALSE,
DivWin1 = FALSE, DivWin2 = TRUE, DivWin3 = FALSE, LgWin1 = FALSE,
LgWin2 = FALSE, LgWin3 = FALSE, WSWin1 = FALSE, WSWin2 = FALSE,
WSWin3 = FALSE), .Names = c("franchID", "yearID", "name",
"park", "attendance", "W", "W1", "W2", "W3", "WCWin1", "WCWin2",
"WCWin3", "DivWin1", "DivWin2", "DivWin3", "LgWin1", "LgWin2",
"LgWin3", "WSWin1", "WSWin2", "WSWin3"), row.names = c(NA, -1L
), class = "data.frame")
您可以通过将长度为 K 的向量传递给支持的先验分布之一来为 K 个预测变量的系数指定先验。例如,如果 K = 4 你可以做
wi_prior2 <- normal(location = c(0, 1, -2, 5))
您还可以传递比例向量和/或与 normal
不同的族。然后,您可以用 prior = wi_prior2
调用 stan_glmer
。如果你这样做
wi_prior2 <- normal(location = 0)
那么相同的先验将用于所有 K 个公共系数。
但是,对于您的情况,我怀疑 fmla
是错误的。您通常还希望在 lme4 样式的括号表达式之外包含大部分(如果不是全部)这些预测变量,以允许 franchID
的所有级别的共同影响。因此,fmla
将变为
fmla <- attendance ~ W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 + (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
如果您只包括括号中的部分,则您假设这些变量的系数在总体中恰好为零,并且仅在 franchID
水平定义的子总体中偏离零。因此,将没有机会对它们的系数进行先验分布。
与公共系数的分组偏差的先验是条件多元正态分布,平均向量为零,协方差结构有些复杂但未知。这在 help(priors, package = "rstanarm")
.
中有更详细的解释
我对如何为以下模型的多个预测变量设置先验感到有点困惑:
require(rstanarm)
wi_prior <- normal(0, sd(train$attendance))
SEED <- 101
fmla <- attendance ~ (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
baylm <- stan_glmer(fmla,
data = train,
family = "gaussian",
algorithm = "sampling",
adapt_delta = .95,
prior_intercept = wi_prior, seed = SEED)
根据请求,这是火车中的第一个观察结果。
train <- structure(list(franchID = structure(25L, .Label = c("ANA", "ARI",
"ATL", "BAL", "BOS", "CHC", "CHW", "CIN", "CLE", "COL", "DET",
"FLA", "HOU", "KCR", "LAD", "MIL", "MIN", "NYM", "NYY", "OAK",
"PHI", "PIT", "SDP", "SEA", "SFG", "STL", "TBD", "TEX", "TOR",
"WSN"), class = "factor"), yearID = 1999L, name = "San Francisco Giants",
park = "3Com Park", attendance = 2078399L, W = 86L, W1 = 89L,
W2 = 90L, W3 = 68L, WCWin1 = FALSE, WCWin2 = FALSE, WCWin3 = FALSE,
DivWin1 = FALSE, DivWin2 = TRUE, DivWin3 = FALSE, LgWin1 = FALSE,
LgWin2 = FALSE, LgWin3 = FALSE, WSWin1 = FALSE, WSWin2 = FALSE,
WSWin3 = FALSE), .Names = c("franchID", "yearID", "name",
"park", "attendance", "W", "W1", "W2", "W3", "WCWin1", "WCWin2",
"WCWin3", "DivWin1", "DivWin2", "DivWin3", "LgWin1", "LgWin2",
"LgWin3", "WSWin1", "WSWin2", "WSWin3"), row.names = c(NA, -1L
), class = "data.frame")
您可以通过将长度为 K 的向量传递给支持的先验分布之一来为 K 个预测变量的系数指定先验。例如,如果 K = 4 你可以做
wi_prior2 <- normal(location = c(0, 1, -2, 5))
您还可以传递比例向量和/或与 normal
不同的族。然后,您可以用 prior = wi_prior2
调用 stan_glmer
。如果你这样做
wi_prior2 <- normal(location = 0)
那么相同的先验将用于所有 K 个公共系数。
但是,对于您的情况,我怀疑 fmla
是错误的。您通常还希望在 lme4 样式的括号表达式之外包含大部分(如果不是全部)这些预测变量,以允许 franchID
的所有级别的共同影响。因此,fmla
将变为
fmla <- attendance ~ W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 + (1 + W + W1 + W2 + W3 + DivWin1 + DivWin2 + DivWin3 +
WSWin1 | franchID)
如果您只包括括号中的部分,则您假设这些变量的系数在总体中恰好为零,并且仅在 franchID
水平定义的子总体中偏离零。因此,将没有机会对它们的系数进行先验分布。
与公共系数的分组偏差的先验是条件多元正态分布,平均向量为零,协方差结构有些复杂但未知。这在 help(priors, package = "rstanarm")
.