从 glmmTMB 输出中提取后验模式和可信区间

Question

我通常使用 lme4 包，但 glmmTMB 包越来越适合处理高度复杂的数据（想想过度分散 and/or 零-inflation ).

Is there a way to extract posterior modes and credible intervals from glmmTMB models, similar to how it is done for lme4 models (example ).

详情：

我正在处理计数数据（可用 here) that are zero-inflated and overdispersed and have random effects. The package best suited to work with this sort of data is the glmmTMB (details here）。（注意两个异常值：euc0==78 和 np_other_grass==20）。

数据如下所示：

euc0 ea_grass ep_grass np_grass np_other_grass month year precip season   prop_id quad
 3      5.7      0.0     16.7            4.0     7 2006    526 Winter    Barlow    1
 0      6.7      0.0     28.3            0.0     7 2006    525 Winter    Barlow    2
 0      2.3      0.0      3.3            0.0     7 2006    524 Winter    Barlow    3
 0      1.7      0.0     13.3            0.0     7 2006    845 Winter    Blaber    4
 0      5.7      0.0     45.0            0.0     7 2006    817 Winter    Blaber    5
 0     11.7      1.7     46.7            0.0     7 2006    607 Winter    DClark    3

glmmTMB型号：

model<-glmmTMB(euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1|prop_id), data = euc, family= nbinom2) #nbimom2 lets var increases quadratically
summary(model)
confint(model) #this gives the confidence intervals

我通常如何提取 lmer/glmer 模型的后验模式和可信区间：

#extracting model estimates and credible intervals
sm.model <-arm::sim(model, n.sim=1000)
smfixef.model = sm.model@fixef
smfixef.model =coda::as.mcmc(smfixef.model)
MCMCglmm::posterior.mode(smfixef.model)  #mode of the distribution
coda::HPDinterval(smfixef.model)  #credible intervals

#among-brood variance
bid<-sm.model@ranef$prop_id[,,1]
bvar<-as.vector(apply(bid, 1, var)) #between brood variance posterior distribution
bvar<-coda::as.mcmc(bvar)
MCMCglmm::posterior.mode(bvar) #mode of the distribution
coda::HPDinterval(bvar) #credible intervals

Answer 1

大部分回答：

获取条件模型参数的多元正态样本非常容易（我认为这就是arm::sim()正在做的事情。

library(MASS)
pp <- fixef(model)$cond
vv <- vcov(model)$cond
samp <- MASS::mvrnorm(1000, mu=pp, Sigma=vv)

（然后使用上面的其余方法）。

我有点怀疑你的第二个例子是否在做你想让它做的事。条件模式的方差 不一定是组间方差的良好估计 （例如，参见 here）。此外，我对半评估的贝叶斯方法感到紧张（例如，为什么没有先验？为什么要看后验模式，这在贝叶斯上下文中很少是有意义的值？）尽管我自己有时也会使用类似的方法！）但是，使用 glmmTMB 结果进行正确的马尔可夫链分析 Monte Carlo 并不难：

library(tmbstan)
library(rstan)
library(coda)
library(emdbook) ## for lump.mcmc.list(), or use runjags::combine.mcmc()

t2 <- system.time(m2 <- tmbstan(model$obj))
m3 <- rstan::As.mcmc.list(m2)
lattice::xyplot(m3,layout=c(5,6))
m4 <- emdbook::lump.mcmc.list(m3)
coda::HPDinterval(m4)

知道 m4 的 theta 列是组间标准差的对数可能会有所帮助...

（有关更多信息，请参阅 vignette("mcmc", package="glmmTMB")...）

Answer 2

我认为 Ben 已经回答了你的问题，所以我的回答并没有增加讨论的内容......也许只是一件事，正如你在评论中所写的那样，你对内部和之间感兴趣 -组方差。您可以通过 parameters::random_parameters() 获取这些信息（如果我没有误解您要查找的内容）。请参阅下面的示例，该示例首先从多元正态分布中生成模拟样本（就像 Ben 的示例中一样），然后为您提供随机效应方差的摘要...

library(readr)
library(glmmTMB)
library(parameters)
library(bayestestR)
library(insight)

euc_data <- read_csv("D:/Downloads/euc_data.csv")
model <-
  glmmTMB(
    euc0 ~ ea_grass + ep_grass + np_grass + np_other_grass + (1 | prop_id),
    data = euc_data,
    family = nbinom2
  ) #nbimom2 lets var increases quadratically


# generate samples
samples <- parameters::simulate_model(model)
#> Model has no zero-inflation component. Simulating from conditional parameters.


# describe samples
bayestestR::describe_posterior(samples)
#> # Description of Posterior Distributions
#> 
#> Parameter      | Median |           89% CI |    pd |        89% ROPE | % in ROPE
#> --------------------------------------------------------------------------------
#> (Intercept)    | -1.072 | [-2.183, -0.057] | 0.944 | [-0.100, 0.100] |     1.122
#> ea_grass       | -0.001 | [-0.033,  0.029] | 0.525 | [-0.100, 0.100] |   100.000
#> ep_grass       | -0.050 | [-0.130,  0.038] | 0.839 | [-0.100, 0.100] |    85.297
#> np_grass       | -0.020 | [-0.054,  0.012] | 0.836 | [-0.100, 0.100] |   100.000
#> np_other_grass | -0.002 | [-0.362,  0.320] | 0.501 | [-0.100, 0.100] |    38.945


# or directly get summary of sample description
sp <- parameters::simulate_parameters(model, ci = .95, ci_method = "hdi", test = c("pd", "p_map"))
sp
#> Model has no zero-inflation component. Simulating from conditional parameters.
#> # Description of Posterior Distributions
#> 
#> Parameter      | Coefficient | p_MAP |    pd |              CI
#> --------------------------------------------------------------
#> (Intercept)    |      -1.037 | 0.281 | 0.933 | [-2.305, 0.282]
#> ea_grass       |      -0.001 | 0.973 | 0.511 | [-0.042, 0.037]
#> ep_grass       |      -0.054 | 0.553 | 0.842 | [-0.160, 0.047]
#> np_grass       |      -0.019 | 0.621 | 0.802 | [-0.057, 0.023]
#> np_other_grass |       0.019 | 0.999 | 0.540 | [-0.386, 0.450]

plot(sp) + see::theme_modern()
#> Model has no zero-inflation component. Simulating from conditional parameters.

# random effect variances
parameters::random_parameters(model)
#> # Random Effects
#> 
#> Within-Group Variance         2.92 (1.71)
#> Between-Group Variance
#>   Random Intercept (prop_id)   2.1 (1.45)
#> N (groups per factor)
#>   prop_id                       18
#> Observations                   346

insight::get_variance(model)
#> Warning: mu of 0.2 is too close to zero, estimate of random effect variances may be unreliable.
#> $var.fixed
#> [1] 0.3056285
#> 
#> $var.random
#> [1] 2.104233
#> 
#> $var.residual
#> [1] 2.91602
#> 
#> $var.distribution
#> [1] 2.91602
#> 
#> $var.dispersion
#> [1] 0
#> 
#> $var.intercept
#>  prop_id 
#> 2.104233

^{由 reprex package (v0.3.0)}

于 2020-05-26 创建

从 glmmTMB 输出中提取后验模式和可信区间

Extracting posterior modes and credible intervals from glmmTMB output

r

bayesian

mcmc

credible-interval

glmmtmb