使 lme4 中混合模型的随机部分的语法更短
Making the syntax for random part of a mixed-model in lme4 shorter
我想知道是否有更简洁的方法来编写语法的随机部分的语法,如下所示?
PS.: 本质上,我试图为我的二元预测变量 lb_wght
(见下方随机效应的相关矩阵)。
library(lme4)
math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
math <- within(math, lb_wght <- as.factor(lb_wght))
m <- lmer(
math ~ 0 + lb_wght + lb_wght:grade +
(0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) + # This line and
(0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) , # This line
data = math)
我对回答这个问题犹豫不决,因为我怀疑底层统计模型有问题,但这是 StackExchange 的编程部门,所以这里是编程解决方案:
要点:
- 表示为 0 和 1 的二进制分类变量已经进行了虚拟编码。通过
factor
告诉 R 它代表分类对比当然没有问题,有时也很有用,但 R 会在拟合该模型之前将其变回 0 和 1。
- 0 和 1 也可以解释为布尔值。我们可以通过否定来交换布尔值。
- 使用
dummy(lb_wght, "0")
只是提供原始二进制变量的反向编码。同样,dummy(lb_wght, "1")
本质上是一个无操作——它只是返回原始的虚拟对比变量。
- 我们可以将
dummy(lb_wght, "0")
的反向编码表示为布尔否定。
数据争论热身
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> # A two-level contrast represented as dummy coding means we have 0 and 1
> # This is the same as FALSE and TRUE and so we can swap them using boolean negation
> math <- within(math, {
+ lb_wght0 <- as.numeric(!lb_wght)
+ lb_wght1 <- lb_wght
+ })
> head(math) # check what those new vars look like
id female lb_wght anti_k1 math grade occ age men spring anti nb_wght lb_wght1 lb_wght0
1 201 1 0 0 38 1 2 111 0 1 0 1 0 1
2 201 1 0 0 55 3 3 135 1 1 0 1 0 1
3 303 1 0 1 26 0 2 121 0 1 2 1 0 1
4 303 1 0 1 33 3 3 145 0 1 2 1 0 1
5 2702 0 0 0 56 0 2 100 . 1 0 1 0 1
6 2702 0 0 0 58 2 3 125 . 1 2 1 0 1
RE 中无相关性
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade || id), # This line and
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
((0 + lb_wght0 | id) + (0 + lb_wght1 | id) + (0 + lb_wght0:grade | id) + (0 + grade:lb_wght1 | id))
Data: math
REML criterion at convergence: 15932
Scaled residuals:
Min 1Q Median 3Q Max
-3.06601 -0.52971 -0.00312 0.53661 2.54121
Random effects:
Groups Name Variance Std.Dev.
id lb_wght0 61.9312 7.8696
id.1 lb_wght1 84.7006 9.2033
id.2 lb_wght0:grade 0.7044 0.8393
id.3 grade:lb_wght1 0.6598 0.8123
Residual 36.3585 6.0298
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.48689 0.36619 96.91
lb_wght1 32.81813 1.35307 24.25
lb_wght0:grade 4.29283 0.09059 47.39
grade:lb_wght1 4.86449 0.32029 15.19
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.533 0.000
grd:lb_wgh1 0.000 -0.489 0.000
匹配OP中的关联结构
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade | id) +
+ (0 + lb_wght1 + lb_wght1:grade | id),
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
(0 + lb_wght0 + lb_wght0:grade | id) + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000
为了比较,OP 的模型:
> math <- within(math, lb_wght <- as.factor(lb_wght))
> m <- lmer(
+ math ~ 0 + lb_wght + lb_wght:grade +
+ (0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) + # This line and
+ (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) , # This line
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght + lb_wght:grade + (0 + dummy(lb_wght, "0") +
dummy(lb_wght, "0"):grade | id) + (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id dummy(lb_wght, "0") 61.1969 7.8228
dummy(lb_wght, "0"):grade 0.6711 0.8192 0.04
id.1 dummy(lb_wght, "1") 97.4494 9.8716
dummy(lb_wght, "1"):grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
lb_wght1:grade 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
lb_wght1:gr 0.000 -0.567 0.000
编辑:2021-02-08 20:57 UTC
NB 这不是所问的问题,而是 post-hoc 评论中要求的问题。尽管如此,指标变量的编程生成和使用基数 R 的公式可能很有趣。这可以重写为使用 apply
....但这似乎是对手头问题的过早优化。我不认为在不真正考虑该模型甚至意味着什么的情况下将这种方法推广到两个以上的级别是个好主意,但同样,这是对编程问题的回答,而不是对它实现的统计过程.
在实验变量中做任意数量的水平
这将自动生成所有指标变量和公式。
> # GOOD LUCK WITH THIS
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> grps <- unique(math$lb_wght)
>
> form <- "math ~ 0"
>
> for(g in grps){
+ gname <- paste0("lb_wght",g)
+ math[, gname] <- ifelse(math$lb_wght == g, 1, 0)
+ term <- paste("0", gname, paste0(gname,":grade"), sep="+")
+ re <- paste0("(", term, "|id)")
+ form <- paste(form, term, re, sep="+")
+ }
>
> form <- as.formula(form)
>
> m <- lmer(form, data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + 0 + lb_wght0 + lb_wght0:grade + (0 + lb_wght0 + lb_wght0:grade |
id) + 0 + lb_wght1 + lb_wght1:grade + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000
我想知道是否有更简洁的方法来编写语法的随机部分的语法,如下所示?
PS.: 本质上,我试图为我的二元预测变量 lb_wght
(见下方随机效应的相关矩阵)。
library(lme4)
math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
math <- within(math, lb_wght <- as.factor(lb_wght))
m <- lmer(
math ~ 0 + lb_wght + lb_wght:grade +
(0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) + # This line and
(0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) , # This line
data = math)
我对回答这个问题犹豫不决,因为我怀疑底层统计模型有问题,但这是 StackExchange 的编程部门,所以这里是编程解决方案:
要点:
- 表示为 0 和 1 的二进制分类变量已经进行了虚拟编码。通过
factor
告诉 R 它代表分类对比当然没有问题,有时也很有用,但 R 会在拟合该模型之前将其变回 0 和 1。 - 0 和 1 也可以解释为布尔值。我们可以通过否定来交换布尔值。
- 使用
dummy(lb_wght, "0")
只是提供原始二进制变量的反向编码。同样,dummy(lb_wght, "1")
本质上是一个无操作——它只是返回原始的虚拟对比变量。 - 我们可以将
dummy(lb_wght, "0")
的反向编码表示为布尔否定。
数据争论热身
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> # A two-level contrast represented as dummy coding means we have 0 and 1
> # This is the same as FALSE and TRUE and so we can swap them using boolean negation
> math <- within(math, {
+ lb_wght0 <- as.numeric(!lb_wght)
+ lb_wght1 <- lb_wght
+ })
> head(math) # check what those new vars look like
id female lb_wght anti_k1 math grade occ age men spring anti nb_wght lb_wght1 lb_wght0
1 201 1 0 0 38 1 2 111 0 1 0 1 0 1
2 201 1 0 0 55 3 3 135 1 1 0 1 0 1
3 303 1 0 1 26 0 2 121 0 1 2 1 0 1
4 303 1 0 1 33 3 3 145 0 1 2 1 0 1
5 2702 0 0 0 56 0 2 100 . 1 0 1 0 1
6 2702 0 0 0 58 2 3 125 . 1 2 1 0 1
RE 中无相关性
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade || id), # This line and
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
((0 + lb_wght0 | id) + (0 + lb_wght1 | id) + (0 + lb_wght0:grade | id) + (0 + grade:lb_wght1 | id))
Data: math
REML criterion at convergence: 15932
Scaled residuals:
Min 1Q Median 3Q Max
-3.06601 -0.52971 -0.00312 0.53661 2.54121
Random effects:
Groups Name Variance Std.Dev.
id lb_wght0 61.9312 7.8696
id.1 lb_wght1 84.7006 9.2033
id.2 lb_wght0:grade 0.7044 0.8393
id.3 grade:lb_wght1 0.6598 0.8123
Residual 36.3585 6.0298
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.48689 0.36619 96.91
lb_wght1 32.81813 1.35307 24.25
lb_wght0:grade 4.29283 0.09059 47.39
grade:lb_wght1 4.86449 0.32029 15.19
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.533 0.000
grd:lb_wgh1 0.000 -0.489 0.000
匹配OP中的关联结构
> m <- lmer(
+ math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
+ (0 + lb_wght0 + lb_wght0:grade | id) +
+ (0 + lb_wght1 + lb_wght1:grade | id),
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght0 + lb_wght0:grade + lb_wght1 + lb_wght1:grade +
(0 + lb_wght0 + lb_wght0:grade | id) + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000
为了比较,OP 的模型:
> math <- within(math, lb_wght <- as.factor(lb_wght))
> m <- lmer(
+ math ~ 0 + lb_wght + lb_wght:grade +
+ (0 + dummy(lb_wght, "0") + dummy(lb_wght, "0"):grade|id) + # This line and
+ (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade|id) , # This line
+ data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + lb_wght + lb_wght:grade + (0 + dummy(lb_wght, "0") +
dummy(lb_wght, "0"):grade | id) + (0 + dummy(lb_wght, "1") + dummy(lb_wght, "1"):grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id dummy(lb_wght, "0") 61.1969 7.8228
dummy(lb_wght, "0"):grade 0.6711 0.8192 0.04
id.1 dummy(lb_wght, "1") 97.4494 9.8716
dummy(lb_wght, "1"):grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
lb_wght1:grade 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
lb_wght1:gr 0.000 -0.567 0.000
编辑:2021-02-08 20:57 UTC
NB 这不是所问的问题,而是 post-hoc 评论中要求的问题。尽管如此,指标变量的编程生成和使用基数 R 的公式可能很有趣。这可以重写为使用 apply
....但这似乎是对手头问题的过早优化。我不认为在不真正考虑该模型甚至意味着什么的情况下将这种方法推广到两个以上的级别是个好主意,但同样,这是对编程问题的回答,而不是对它实现的统计过程.
在实验变量中做任意数量的水平
这将自动生成所有指标变量和公式。
> # GOOD LUCK WITH THIS
> library("lme4")
>
> math <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/mlgrp1.csv')
> grps <- unique(math$lb_wght)
>
> form <- "math ~ 0"
>
> for(g in grps){
+ gname <- paste0("lb_wght",g)
+ math[, gname] <- ifelse(math$lb_wght == g, 1, 0)
+ term <- paste("0", gname, paste0(gname,":grade"), sep="+")
+ re <- paste0("(", term, "|id)")
+ form <- paste(form, term, re, sep="+")
+ }
>
> form <- as.formula(form)
>
> m <- lmer(form, data = math)
> summary(m)
Linear mixed model fit by REML ['lmerMod']
Formula: math ~ 0 + 0 + lb_wght0 + lb_wght0:grade + (0 + lb_wght0 + lb_wght0:grade |
id) + 0 + lb_wght1 + lb_wght1:grade + (0 + lb_wght1 + lb_wght1:grade | id)
Data: math
REML criterion at convergence: 15931.1
Scaled residuals:
Min 1Q Median 3Q Max
-3.09189 -0.52867 -0.00063 0.53419 2.54169
Random effects:
Groups Name Variance Std.Dev. Corr
id lb_wght0 61.1969 7.8228
lb_wght0:grade 0.6711 0.8192 0.04
id.1 lb_wght1 97.4494 9.8716
lb_wght1:grade 1.4314 1.1964 -0.35
Residual 36.2755 6.0229
Number of obs: 2221, groups: id, 932
Fixed effects:
Estimate Std. Error t value
lb_wght0 35.4848 0.3646 97.31
lb_wght1 32.8506 1.4214 23.11
lb_wght0:grade 4.2930 0.0902 47.60
grade:lb_wght1 4.8827 0.3414 14.30
Correlation of Fixed Effects:
lb_wg0 lb_wg1 lb_w0:
lb_wght1 0.000
lb_wght0:gr -0.527 0.000
grd:lb_wgh1 0.000 -0.567 0.000