使 lme4 中混合模型的随机部分的语法更短

Making the syntax for random part of a mixed-model in lme4 shorter

我想知道是否有更简洁的方法来编写语法的随机部分的语法,如下所示?

PS.: 本质上,我试图为我的二元预测变量 lb_wght (见下方随机效应的相关矩阵)。

library(lme4)

math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
math <- within(math, lb_wght <- as.factor(lb_wght))

m <- lmer(
    math ~ 0 + lb_wght + lb_wght:grade + 
    (0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) +  # This line and
    (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) ,  # This line
    data = math)

我对回答这个问题犹豫不决,因为我怀疑底层统计模型有问题,但这是 StackExchange 的编程部门,所以这里是编程解决方案:

要点:

  • 表示为 0 和 1 的二进制分类变量已经进行了虚拟编码。通过 factor 告诉 R 它代表分类对比当然没有问题,有时也很有用,但 R 会在拟合该模型之前将其变回 0 和 1。
  • 0 和 1 也可以解释为布尔值。我们可以通过否定来交换布尔值。
  • 使用dummy(lb_wght, "0")只是提供原始二进制变量的反向编码。同样,dummy(lb_wght, "1") 本质上是一个无操作——它只是返回原始的虚拟对比变量。
  • 我们可以将dummy(lb_wght, "0")的反向编码表示为布尔否定。

数据争论热身

> library("lme4")
> 
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> # A two-level contrast represented as dummy coding means we have 0 and 1
> # This is the same as FALSE and TRUE and so we can swap them using boolean negation
> math <- within(math, {
+                         lb_wght0 <- as.numeric(!lb_wght) 
+                         lb_wght1 <- lb_wght
+                      })
> head(math) # check what those new vars look like
    id female lb_wght anti_k1 math grade occ age men spring anti nb_wght lb_wght1 lb_wght0
1  201      1       0       0   38     1   2 111   0      1    0       1        0        1
2  201      1       0       0   55     3   3 135   1      1    0       1        0        1
3  303      1       0       1   26     0   2 121   0      1    2       1        0        1
4  303      1       0       1   33     3   3 145   0      1    2       1        0        1
5 2702      0       0       0   56     0   2 100   .      1    0       1        0        1
6 2702      0       0       0   58     2   3 125   .      1    2       1        0        1

RE 中无相关性

> m <- lmer(
+   math ~ 0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade + 
+     (0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade || id),   # This line and
+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +  
    ((0 + lb_wght0 | id) + (0 + lb_wght1 | id) + (0 + lb_wght0:grade |          id) + (0 + grade:lb_wght1 | id))
   Data: math

REML criterion at convergence: 15932

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.06601 -0.52971 -0.00312  0.53661  2.54121 

Random effects:
 Groups   Name           Variance Std.Dev.
 id       lb_wght0       61.9312  7.8696  
 id.1     lb_wght1       84.7006  9.2033  
 id.2     lb_wght0:grade  0.7044  0.8393  
 id.3     grade:lb_wght1  0.6598  0.8123  
 Residual                36.3585  6.0298  
Number of obs: 2221, groups:  id, 932

Fixed effects:
               Estimate Std. Error t value
lb_wght0       35.48689    0.36619   96.91
lb_wght1       32.81813    1.35307   24.25
lb_wght0:grade  4.29283    0.09059   47.39
grade:lb_wght1  4.86449    0.32029   15.19

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.533  0.000       
grd:lb_wgh1  0.000 -0.489  0.000

匹配OP中的关联结构

> m <- lmer(
+   math ~ 0 + lb_wght0 + lb_wght0:grade  + lb_wght1 +  lb_wght1:grade + 
+     (0 + lb_wght0 + lb_wght0:grade | id) + 
+     (0 + lb_wght1 +  lb_wght1:grade | id),
+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +  
    (0 + lb_wght0 + lb_wght0:grade | id) + (0 + lb_wght1 + lb_wght1:grade |      id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name           Variance Std.Dev. Corr 
 id       lb_wght0       61.1969  7.8228        
          lb_wght0:grade  0.6711  0.8192   0.04 
 id.1     lb_wght1       97.4494  9.8716        
          lb_wght1:grade  1.4314  1.1964   -0.35
 Residual                36.2755  6.0229        
Number of obs: 2221, groups:  id, 932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
grade:lb_wght1   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
grd:lb_wgh1  0.000 -0.567  0.000

为了比较,OP 的模型:

> math <- within(math, lb_wght <- as.factor(lb_wght))
> m <- lmer(
+   math ~ 0 + lb_wght + lb_wght:grade + 
+     (0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) +  # This line and
+     (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) ,  # This line
+   data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght + lb_wght:grade + (0 + dummy(lb_wght, "0") +  
    dummy(lb_wght, "0"):grade | id) + (0 + dummy(lb_wght, "1") +      dummy(lb_wght, "1"):grade | id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name                      Variance Std.Dev. Corr 
 id       dummy(lb_wght, "0")       61.1969  7.8228        
          dummy(lb_wght, "0"):grade  0.6711  0.8192   0.04 
 id.1     dummy(lb_wght, "1")       97.4494  9.8716        
          dummy(lb_wght, "1"):grade  1.4314  1.1964   -0.35
 Residual                           36.2755  6.0229        
Number of obs: 2221, groups:  id, 932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
lb_wght1:grade   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
lb_wght1:gr  0.000 -0.567  0.000

编辑:2021-02-08 20:57 UTC

NB 这不是所问的问题,而是 post-hoc 评论中要求的问题。尽管如此,指标变量的编程生成和使用基数 R 的公式可能很有趣。这可以重写为使用 apply ....但这似乎是对手头问题的过早优化。我不认为在不真正考虑该模型甚至意味着什么的情况下将这种方法推广到两个以上的级别是个好主意,但同样,这是对编程问题的回答,而不是对它实现的统计过程.

在实验变量中做任意数量的水平

这将自动生成所有指标变量和公式。

> # GOOD LUCK WITH THIS
> library("lme4")
> 
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> grps <- unique(math$lb_wght)
> 
> form <- "math ~ 0"
> 
> for(g in grps){
+   gname <- paste0("lb_wght",g)
+   math[, gname] <- ifelse(math$lb_wght == g, 1, 0) 
+   term <- paste("0", gname, paste0(gname,":grade"), sep="+")
+   re <- paste0("(", term, "|id)")
+   form <- paste(form, term, re, sep="+")
+ }
> 
> form <- as.formula(form)
> 
> m <- lmer(form, data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + 0 + lb_wght0 + lb_wght0:grade + (0 + lb_wght0 + lb_wght0:grade |  
    id) + 0 + lb_wght1 + lb_wght1:grade + (0 + lb_wght1 + lb_wght1:grade |      id)
   Data: math

REML criterion at convergence: 15931.1

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-3.09189 -0.52867 -0.00063  0.53419  2.54169 

Random effects:
 Groups   Name           Variance Std.Dev. Corr 
 id       lb_wght0       61.1969  7.8228        
          lb_wght0:grade  0.6711  0.8192   0.04 
 id.1     lb_wght1       97.4494  9.8716        
          lb_wght1:grade  1.4314  1.1964   -0.35
 Residual                36.2755  6.0229        
Number of obs: 2221, groups:  id, 932

Fixed effects:
               Estimate Std. Error t value
lb_wght0        35.4848     0.3646   97.31
lb_wght1        32.8506     1.4214   23.11
lb_wght0:grade   4.2930     0.0902   47.60
grade:lb_wght1   4.8827     0.3414   14.30

Correlation of Fixed Effects:
            lb_wg0 lb_wg1 lb_w0:
lb_wght1     0.000              
lb_wght0:gr -0.527  0.000       
grd:lb_wgh1  0.000 -0.567  0.000