泊松广义线性混合模型 (GLMM):lme4 和 glmmADMB 之间的硬决策
Poisson generalized linear mixed models (GLMMs): hard decision between lme4 and glmmADMB
随机系数泊松模型很难拟合,lme4 和 glmmADMB 之间的参数估计值往往存在一些差异。但就我而言:
# Packages
library(lme4)
library(glmmADMB)
#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
str(myds)
# 'data.frame': 526 obs. of 10 variables:
# $ Bioma : chr "Pampa" "Pampa" "Pampa" "Pampa" ...
# $ estacao : chr "verao" "verao" "verao" "verao" ...
# $ ciclo : chr "1°" "1°" "1°" "1°" ...
# $ Hour : int 22 23 0 1 2 3 4 5 6 7 ...
# $ anthill : num 23.5 23.5 23.5 23.5 23.5 ...
# $ formigueiro: int 2 2 2 2 2 2 2 2 2 2 ...
# $ ladenant : int 34 39 29 25 20 31 16 28 21 12 ...
# $ unladen : int 271 258 298 317 316 253 185 182 116 165 ...
# $ UR : num 65.7 69 71.3 75.8 78.1 ...
# $ temp : num 24.3 24.3 24 23.7 23.1 ...
我在生物群系(Bioma
)、温度(temp
)和湿度(UR
)的函数中有一些昆虫(ladenant
),但是formiguieros
是伪复制。然后我尝试使用 lme4 和 glmmADMB 对关系进行建模。
首先我尝试 lme4:
m.laden.1 <- glmer(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = poisson(link = "log"))
summary(m.laden.1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: poisson ( log )
# Formula: ladenant ~ Bioma + poly(temp, 2) + UR + (1 | formigueiro)
# Data: DataBase
# AIC BIC logLik deviance df.resid
# 21585.9 21615.8 -10786.0 21571.9 519
# Scaled residuals:
# Min 1Q Median 3Q Max
# -10.607 -4.245 -1.976 2.906 38.242
# Random effects:
# Groups Name Variance Std.Dev.
# formigueiro (Intercept) 0.02049 0.1432
# Number of obs: 526, groups: formigueiro, 5
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.7379495 0.0976701 7.556 4.17e-14 ***
# BiomaTransition 1.3978383 0.0209623 66.684 < 2e-16 ***
# BiomaPampa -0.1256759 0.0268164 -4.687 2.78e-06 ***
# poly(temp, 2)1 7.1035195 0.2079550 34.159 < 2e-16 ***
# poly(temp, 2)2 -7.2900687 0.2629908 -27.720 < 2e-16 ***
# UR 0.0302810 0.0008029 37.717 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Correlation of Fixed Effects:
# (Intr) BmTrns BimPmp p(,2)1 p(,2)2
# BiomaTrnstn -0.586
# BiomaPampa -0.199 0.352
# ply(tmp,2)1 -0.208 0.267 0.312
# ply(tmp,2)2 -0.191 0.085 -0.175 -0.039
# UR -0.746 0.709 0.188 0.230 0.316
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# Model is nearly unidentifiable: very large eigenvalue
# - Rescale variables?
第二次尝试 glmmADMB:
m.laden.2 <- glmmadmb(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = "poisson", link = "log")
summary(m.laden.2)
# Call:
# glmmadmb(formula = ladenant ~ Bioma + poly(temp, 2) + UR + (1 |
# formigueiro), data = DataBase, family = "poisson", link = "log")
# AIC: 12033.9
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 1.52390 0.26923 5.66 1.5e-08 ***
# BiomaTransition 0.23967 0.08878 2.70 0.0069 **
# BiomaPampa 0.09680 0.05198 1.86 0.0626 .
# poly(temp, 2)1 -0.38754 0.55678 -0.70 0.4864
# poly(temp, 2)2 -1.16028 0.39608 -2.93 0.0034 **
# UR 0.01560 0.00261 5.97 2.4e-09 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Number of observations: total=526, formigueiro=5
# Random effect variance(s):
# Group=formigueiro
# Variance StdDev
# (Intercept) 0.07497 0.2738
尽管在不同的统计包和估计之间,模型在 Biome
变量的情况下具有完全不同的显着性水平。我的问题是,
哪里有其他方法可以用来比较结果并选择最终模型?
提前致谢。
我有点忘乎所以了。 tl;dr 正如评论中指出的那样,很难让 glmmADMB
与泊松模型一起工作,但是过度分散的模型(例如负二项式)显然要好得多.此外,您可能应该在模型中加入随机斜率的某些方面...
包(colorblindr
是可选的):
library(lme4)
library(glmmADMB)
library(glmmTMB)
library(broom.mixed)
library(tidyverse) ## ggplot2, dplyr, tidyr, purrr ...
library(colorblindr) ## remotes::install_github("clauswilke/colorblindr")
theme_set(theme_bw())
获取数据:标准化输入变量,以便我们轻松绘制合理的系数图
## read.csv doesn't work for me out of the box, locale/encoding issues
myds <- readr::read_csv("my_glmm_dataset.csv") %>%
mutate(across(formigueiro, as.factor),
across(c(UR, temp), ~ drop(scale(.))))
公式和模型:
form <- ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro)
## random slopes, all independent (also tried with correlations (| instead
## of ||), but fails)
form_x <- ladenant ~ Bioma + poly(temp,2) + UR + (1 + UR + poly(temp,2) || formigueiro)
glmer_pois <- glmer(form, data = myds, family = poisson(link = "log"))
## fails
glmmADMB_pois <- try(glmmadmb(form, data = myds, family = "poisson"))
## fails ("Parameters were estimated, but standard errors were not:
## the most likely problem is that the curvature at MLE was zero or negative"
glmmTMB_pois <- glmmTMB(form, data = myds, family = poisson)
glmer_nb2 <- glmer.nb(form, data = myds)
glmmADMB_nb2 <- glmmadmb(form, data = myds, family = "nbinom2")
glmmTMB_nb2 <- update(glmmTMB_pois, family = "nbinom2")
glmmTMB_nb1 <- update(glmmTMB_pois, family = "nbinom1")
glmmTMB_nb2ext <- update(glmmTMB_nb2, formula = form_x)
放在一起:
modList <- tibble::lst(glmer_pois, glmmTMB_pois, glmer_nb2, glmmADMB_nb2, glmmTMB_nb2,
glmmTMB_nb1, glmmTMB_nb2ext)
bbmle::AICtab(modList)
dAIC df
glmmTMB_nb2ext 0.0 11
glmer_nb2 27.0 8
glmmADMB_nb2 27.1 8
glmmTMB_nb2 27.1 8
glmmTMB_nb1 79.5 8
glmmTMB_pois 16658.0 7
glmer_pois 16658.0 7
'nb2' 模型都可以,但随机斜率模型要好得多。
系数图,包括所有方法的固定效应:
tt <- (purrr::map_dfr(modList, tidy, effects = "fixed", conf.int = TRUE,
.id = "model") |>
dplyr::filter(term != "(Intercept)") |>
tidyr::separate(model, into = c("platform", "distrib"))
)
ggplot(tt, aes(y = term, x = estimate, xmin = conf.low, xmax = conf.high,
colour = platform, shape = distrib)) +
geom_pointrange(position = position_dodge(width = 0.25)) +
geom_vline(xintercept = 0, lty =2) +
scale_colour_OkabeIto()
随机系数泊松模型很难拟合,lme4 和 glmmADMB 之间的参数估计值往往存在一些差异。但就我而言:
# Packages
library(lme4)
library(glmmADMB)
#Open my dataset
myds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/my_glmm_dataset.csv")
str(myds)
# 'data.frame': 526 obs. of 10 variables:
# $ Bioma : chr "Pampa" "Pampa" "Pampa" "Pampa" ...
# $ estacao : chr "verao" "verao" "verao" "verao" ...
# $ ciclo : chr "1°" "1°" "1°" "1°" ...
# $ Hour : int 22 23 0 1 2 3 4 5 6 7 ...
# $ anthill : num 23.5 23.5 23.5 23.5 23.5 ...
# $ formigueiro: int 2 2 2 2 2 2 2 2 2 2 ...
# $ ladenant : int 34 39 29 25 20 31 16 28 21 12 ...
# $ unladen : int 271 258 298 317 316 253 185 182 116 165 ...
# $ UR : num 65.7 69 71.3 75.8 78.1 ...
# $ temp : num 24.3 24.3 24 23.7 23.1 ...
我在生物群系(Bioma
)、温度(temp
)和湿度(UR
)的函数中有一些昆虫(ladenant
),但是formiguieros
是伪复制。然后我尝试使用 lme4 和 glmmADMB 对关系进行建模。
首先我尝试 lme4:
m.laden.1 <- glmer(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = poisson(link = "log"))
summary(m.laden.1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: poisson ( log )
# Formula: ladenant ~ Bioma + poly(temp, 2) + UR + (1 | formigueiro)
# Data: DataBase
# AIC BIC logLik deviance df.resid
# 21585.9 21615.8 -10786.0 21571.9 519
# Scaled residuals:
# Min 1Q Median 3Q Max
# -10.607 -4.245 -1.976 2.906 38.242
# Random effects:
# Groups Name Variance Std.Dev.
# formigueiro (Intercept) 0.02049 0.1432
# Number of obs: 526, groups: formigueiro, 5
# Fixed effects:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 0.7379495 0.0976701 7.556 4.17e-14 ***
# BiomaTransition 1.3978383 0.0209623 66.684 < 2e-16 ***
# BiomaPampa -0.1256759 0.0268164 -4.687 2.78e-06 ***
# poly(temp, 2)1 7.1035195 0.2079550 34.159 < 2e-16 ***
# poly(temp, 2)2 -7.2900687 0.2629908 -27.720 < 2e-16 ***
# UR 0.0302810 0.0008029 37.717 < 2e-16 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Correlation of Fixed Effects:
# (Intr) BmTrns BimPmp p(,2)1 p(,2)2
# BiomaTrnstn -0.586
# BiomaPampa -0.199 0.352
# ply(tmp,2)1 -0.208 0.267 0.312
# ply(tmp,2)2 -0.191 0.085 -0.175 -0.039
# UR -0.746 0.709 0.188 0.230 0.316
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# Model is nearly unidentifiable: very large eigenvalue
# - Rescale variables?
第二次尝试 glmmADMB:
m.laden.2 <- glmmadmb(ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro), data = DataBase, family = "poisson", link = "log")
summary(m.laden.2)
# Call:
# glmmadmb(formula = ladenant ~ Bioma + poly(temp, 2) + UR + (1 |
# formigueiro), data = DataBase, family = "poisson", link = "log")
# AIC: 12033.9
# Coefficients:
# Estimate Std. Error z value Pr(>|z|)
# (Intercept) 1.52390 0.26923 5.66 1.5e-08 ***
# BiomaTransition 0.23967 0.08878 2.70 0.0069 **
# BiomaPampa 0.09680 0.05198 1.86 0.0626 .
# poly(temp, 2)1 -0.38754 0.55678 -0.70 0.4864
# poly(temp, 2)2 -1.16028 0.39608 -2.93 0.0034 **
# UR 0.01560 0.00261 5.97 2.4e-09 ***
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# Number of observations: total=526, formigueiro=5
# Random effect variance(s):
# Group=formigueiro
# Variance StdDev
# (Intercept) 0.07497 0.2738
尽管在不同的统计包和估计之间,模型在 Biome
变量的情况下具有完全不同的显着性水平。我的问题是,
哪里有其他方法可以用来比较结果并选择最终模型?
提前致谢。
我有点忘乎所以了。 tl;dr 正如评论中指出的那样,很难让 glmmADMB
与泊松模型一起工作,但是过度分散的模型(例如负二项式)显然要好得多.此外,您可能应该在模型中加入随机斜率的某些方面...
包(colorblindr
是可选的):
library(lme4)
library(glmmADMB)
library(glmmTMB)
library(broom.mixed)
library(tidyverse) ## ggplot2, dplyr, tidyr, purrr ...
library(colorblindr) ## remotes::install_github("clauswilke/colorblindr")
theme_set(theme_bw())
获取数据:标准化输入变量,以便我们轻松绘制合理的系数图
## read.csv doesn't work for me out of the box, locale/encoding issues
myds <- readr::read_csv("my_glmm_dataset.csv") %>%
mutate(across(formigueiro, as.factor),
across(c(UR, temp), ~ drop(scale(.))))
公式和模型:
form <- ladenant ~ Bioma + poly(temp,2) + UR + (1 | formigueiro)
## random slopes, all independent (also tried with correlations (| instead
## of ||), but fails)
form_x <- ladenant ~ Bioma + poly(temp,2) + UR + (1 + UR + poly(temp,2) || formigueiro)
glmer_pois <- glmer(form, data = myds, family = poisson(link = "log"))
## fails
glmmADMB_pois <- try(glmmadmb(form, data = myds, family = "poisson"))
## fails ("Parameters were estimated, but standard errors were not:
## the most likely problem is that the curvature at MLE was zero or negative"
glmmTMB_pois <- glmmTMB(form, data = myds, family = poisson)
glmer_nb2 <- glmer.nb(form, data = myds)
glmmADMB_nb2 <- glmmadmb(form, data = myds, family = "nbinom2")
glmmTMB_nb2 <- update(glmmTMB_pois, family = "nbinom2")
glmmTMB_nb1 <- update(glmmTMB_pois, family = "nbinom1")
glmmTMB_nb2ext <- update(glmmTMB_nb2, formula = form_x)
放在一起:
modList <- tibble::lst(glmer_pois, glmmTMB_pois, glmer_nb2, glmmADMB_nb2, glmmTMB_nb2,
glmmTMB_nb1, glmmTMB_nb2ext)
bbmle::AICtab(modList)
dAIC df
glmmTMB_nb2ext 0.0 11
glmer_nb2 27.0 8
glmmADMB_nb2 27.1 8
glmmTMB_nb2 27.1 8
glmmTMB_nb1 79.5 8
glmmTMB_pois 16658.0 7
glmer_pois 16658.0 7
'nb2' 模型都可以,但随机斜率模型要好得多。
系数图,包括所有方法的固定效应:
tt <- (purrr::map_dfr(modList, tidy, effects = "fixed", conf.int = TRUE,
.id = "model") |>
dplyr::filter(term != "(Intercept)") |>
tidyr::separate(model, into = c("platform", "distrib"))
)
ggplot(tt, aes(y = term, x = estimate, xmin = conf.low, xmax = conf.high,
colour = platform, shape = distrib)) +
geom_pointrange(position = position_dodge(width = 0.25)) +
geom_vline(xintercept = 0, lty =2) +
scale_colour_OkabeIto()