用于百分比响应变量的具有准二项式族的 GLM 混合模型

GLM mixed model with quasibinomial family for percentual response variable

除了时间流逝(date)和10倍积分,我没有任何'treatment'。我总共有 43190 个测量值,它们是百分比响应变量 (canopycov) 的连续二项式数据(0.0 到 1.0)。在 glm 逻辑中,这是一个 quasibinomial 案例,但我在 MASS 包中只找到 glmmPQL 可供使用,但模型不正确,我有 NA 用于 p-values 在所有日期。就我而言,我尝试:

#Packages
library(MASS)

# Dataset
ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
# 'data.frame': 43190 obs. of  3 variables:
#  $ date     : chr  "2021-12-06" "2021-12-06" "2021-12-06" "2021-12-06" ...
#  $ canopycov: int  22 24 24 24 25 25 25 25 26 26 ...
#  $ rep      : chr  "r1" "r1" "r1" "r1" ...
# Binomial Generalized Linear Mixed Models
m.1 <- glmmPQL(canopycov/100~date,random=~1|date,
              family="quasibinomial",data=ds)
summary(m.1)
#Linear mixed-effects model fit by maximum likelihood
#   Data: ds 
#   AIC BIC logLik
#    NA  NA     NA

# Random effects:
#  Formula: ~1 | date
#          (Intercept)  Residual
# StdDev: 1.251838e-06 0.1443305

# Variance function:
#  Structure: fixed weights
#  Formula: ~invwt 
# Fixed effects:  canopycov/100 ~ date 
#                     Value   Std.Error    DF    t-value p-value
# (Intercept)    -0.5955403 0.004589042 43180 -129.77442       0
# date2021-06-14 -0.1249648 0.006555217     0  -19.06341     NaN
# date2021-07-09  0.7661870 0.006363749     0  120.39868     NaN
# date2021-07-24  1.0582366 0.006434893     0  164.45286     NaN
# date2021-08-03  1.0509474 0.006432295     0  163.38607     NaN
# date2021-08-08  1.0794612 0.006442704     0  167.54784     NaN
# date2021-09-02  0.9312346 0.006395722     0  145.60274     NaN
# date2021-09-07  0.9236196 0.006393780     0  144.45595     NaN
# date2021-09-22  0.7268144 0.006359224     0  114.29293     NaN
# date2021-12-06  1.3109809 0.006552314     0  200.07907     NaN
#  Correlation: 
#                (Intr) d2021-06 d2021-07-0 d2021-07-2 d2021-08-03 d2021-08-08
# date2021-06-14 -0.700                                                       
# date2021-07-09 -0.721  0.505                                                
# date2021-07-24 -0.713  0.499    0.514                                       
# date2021-08-03 -0.713  0.499    0.514      0.509                            
# date2021-08-08 -0.712  0.499    0.514      0.508      0.508                 
# date2021-09-02 -0.718  0.502    0.517      0.512      0.512       0.511     
# date2021-09-07 -0.718  0.502    0.518      0.512      0.512       0.511     
# date2021-09-22 -0.722  0.505    0.520      0.515      0.515       0.514     
# date2021-12-06 -0.700  0.490    0.505      0.499      0.500       0.499     
#                d2021-09-02 d2021-09-07 d2021-09-2
# date2021-06-14                                   
# date2021-07-09                                   
# date2021-07-24                                   
# date2021-08-03                                   
# date2021-08-08                                   
# date2021-09-02                                   
# date2021-09-07  0.515                            
# date2021-09-22  0.518       0.518                
# date2021-12-06  0.503       0.503       0.505    

# Standardized Within-Group Residuals:
#         Min          Q1         Med          Q3         Max 
# -6.66259139 -0.47887669  0.09634211  0.54135914  4.32231889 

# Number of Observations: 43190
# Number of Groups: 10 

我想正确地指定我的数据是在混合效果中临时伪复制的,但我找不到其他方法。拜托,我需要任何帮助来解决它。

我不明白这里 quasi-binomial 模型的动机,有一些关于二项式和准二项式密度的很好的讨论 here and here 可能值得一读(包括应用程序)。

代码的问题在于您将 date 作为字符,因此 R 不知道它的日期。您必须决定时间的测量单位以及参考点,但是一旦您解决了这个问题,模型就可以正常工作了。


ds<-read.csv("https://raw.githubusercontent.com/Leprechault/trash/main/pred_attack_F.csv")
str(ds)
head(ds)
class(ds$date)
ds$datex <- as.Date(ds$date)
summary(as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days")))
ds$date_time <- as.numeric(difftime(ds$datex, as.Date(Sys.Date(), "%d%b%Y"), units = "days"))
# scale for laplace
ds$sdate_time <- scale(ds$date_time)

m1 <-  glmer(cbind(canopycov,100 - canopycov)~ date_time + (1|date_time),
                    family="binomial",data=ds)
summary(m1)
# Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
# Family: binomial  ( logit )
# Formula: canopycov/100 ~ date_time + (1 | date_time)
# Data: ds
# 
# AIC      BIC   logLik deviance df.resid 
# 35109.7  35135.7 -17551.9  35103.7    43187 
# 
# Scaled residuals: 
#   Min       1Q   Median       3Q      Max 
# -19.0451  -1.5605  -0.5155  -0.1594   0.8081 
# 
# Random effects:
#   Groups    Name        Variance Std.Dev.
# date_time (Intercept) 0        0       
# Number of obs: 43190, groups:  date_time, 10
# 
# Fixed effects:
#   Estimate Std. Error z value Pr(>|z|)    
# (Intercept) 8.5062394  0.0866696   98.15   <2e-16 ***
#   date_time   0.0399010  0.0004348   91.77   <2e-16 ***
#   ---
#   Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Correlation of Fixed Effects:
#   (Intr)
# date_time 0.988 
# optimizer (Nelder_Mead) convergence code: 0 (OK)
# boundary (singular) fit: see ?isSingular

m2 <- MASS::glmmPQL(canopycov/100~ date_time,random=~1|date_time,
               family="quasibinomial",data=ds)

summary(m2)
# Linear mixed-effects model fit by maximum likelihood
# Data: ds 
# AIC BIC logLik
# NA  NA     NA
# 
# Random effects:
#   Formula: ~1 | date_time
# (Intercept)  Residual
# StdDev:   0.3082808 0.1443456
# 
# Variance function:
#   Structure: fixed weights
# Formula: ~invwt 
# Fixed effects:  canopycov/100 ~ date_time 
# Value Std.Error    DF  t-value p-value
# (Intercept) 0.1767127 0.0974997 43180 1.812443  0.0699
# date_time   0.3232878 0.0975013     8 3.315728  0.0106
# Correlation: 
#   (Intr)
# date_time 0     
# 
# Standardized Within-Group Residuals:
#   Min          Q1         Med          Q3         Max 
# -6.66205315 -0.47852364  0.09635514  0.54154467  4.32129236 
# 
# Number of Observations: 43190
# Number of Groups: 10