( | ) 语法在 R 公式中的含义是什么？

Question

我正在学习教程并遇到以下语法：

# assume 'S' is the name of the subjects column
# assume 'X1' is the name of the first factor column
# assume 'X2' is the name of the second factor column
# assume 'X3' is the name of the third factor column
# assume 'Y' is the name of the response column
# run the ART procedure on 'df'

# linear mixed model syntax; see lme4::lmer
m = art(Y ~ X1 * X2 * X3 + (1|S), data=df) 

anova(m)

我对 (|) 语法有点困惑。我查看了线性混合模型语法 lmer 的文档，发现： “随机效应项由竖线 (|) 区分，将设计矩阵的表达式与分组因子分开”。

所以我假设 1 和 S 是两个随机效应项。 S 作为随机效应是有意义的，因为它是一个可以代表参与者的随机变量。但是 1 怎么会是一个随机变量呢？这里的 1 和 | 是什么意思？

Answer 1

| 符号在不同的函数中以不同的方式用于公式中。在线性混合模型的情况下，它用于表示随机效应。混合模型中可以使用不同类型的随机效应：

随机截距，其中截距（但不是斜率）因受试者而异，
随机斜率，其中斜率（但不是截距）因受试者而异
随机斜率和截距，两者因受试者而异。斜率和截距可以建模为相关或不相关。

公式中的1用于指定使用其中的哪一个。这里是 some examples, taken from my book:

library(lme4)
# Random intercept:
m1 <- lmer(Reaction ~ Days + (1|Subject), data = sleepstudy)

# Random slope:
m2 <- lmer(Reaction ~ Days + (0 + Days|Subject), data = sleepstudy)

# Correlated random intercept and slope:
m3 <- lmer(Reaction ~ Days + (1 + Days|Subject), data = sleepstudy)

# Uncorrelated random intercept and slope:
m4 <- lmer(Reaction ~ Days + (1|Subject) + (0 + Days|Subject),
           data = sleepstudy)

所以在你的例子中，(1|S)用于添加一个随机截距，对应S的不同值。

| 的类似但符号不同的用法可以在 lmtree 的公式中找到 partykit，它用于在节点中用线性模型拟合决策树。在这种情况下，公式看起来像 y ~ x1 + x2 | z1 + z2 + z3，其中 y 是响应变量，x 变量是线性模型中的解释变量，z 变量是用于构建树的变量。

( | ) 语法在 R 公式中的含义是什么？

What does the ( | ) syntax mean in an R formula?

r

lme4

r-formula