缩短回归模型的公式语法
Shortening the formula syntax of a regression model
我想知道下面的回归模型的语法是否可以比现在更简洁(更短)?
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/bv1.csv')
library(nlme)
model <- lme(achieve ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time), correlation = corSymm(),
random = ~0 + D1:time | schcode/id, data = dat, weights = varIdent(form = ~1|factor(math)),
na.action = na.omit, control = lmeControl(maxIter = 200, msMaxIter = 200, niterEM = 50,
msMaxEval = 400))
coef(summary(model))
只关注固定效应部分。
原公式:
form1 <- ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time)
X1 <- model.matrix(form1, data=dat)
我认为这是等价的
form2 <- ~0 +
D1 + D2 +
(D1+D2):(time + schcontext + female + female:time+schcontext:time)
X2 <- model.matrix(form2, data=dat)
(不幸的是 ~ 0 + (D1 + D2):(1 + time + ...)
没有像我 liked/expected 那样工作。)
首先,模型矩阵具有正确的维度。盯着模型矩阵的列名并手动对列重新排序:
X2o <- X2[,c(1:3,6,4,7,5,8,9,11,10,12)]
all.equal(c(X1),c(X2o)) ##TRUE
(对于数值预测变量,您不需要 I(A*B)
:A:B
是等效的。)
实际上,使用 *
运算符
可以做得更好
form3 <- ~0 +
D1 + D2 +
(D1+D2):(time*(schcontext+female))
X3 <- model.matrix(form3, data=dat)
X3o <- X3[,c(1:3,6,4,7,5,8,10,12,9,11)]
all.equal(c(X1),c(X3o)) ## TRUE
比较公式长度:
sapply(list(form1,form2,form3),
function(x) nchar(as.character(x)[[2]]))
## [1] 183 84 54
我想知道下面的回归模型的语法是否可以比现在更简洁(更短)?
dat <- read.csv('https://raw.githubusercontent.com/rnorouzian/v/main/bv1.csv')
library(nlme)
model <- lme(achieve ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time), correlation = corSymm(),
random = ~0 + D1:time | schcode/id, data = dat, weights = varIdent(form = ~1|factor(math)),
na.action = na.omit, control = lmeControl(maxIter = 200, msMaxIter = 200, niterEM = 50,
msMaxEval = 400))
coef(summary(model))
只关注固定效应部分。
原公式:
form1 <- ~ 0 + D1 + D2+
D1:time + D2:time+
D1:schcontext + D2:schcontext +
D1:female + D2:female+
D1:I(female*time) + D2:I(female*time)+
D1:I(schcontext*time) + D2:I(schcontext*time)
X1 <- model.matrix(form1, data=dat)
我认为这是等价的
form2 <- ~0 +
D1 + D2 +
(D1+D2):(time + schcontext + female + female:time+schcontext:time)
X2 <- model.matrix(form2, data=dat)
(不幸的是 ~ 0 + (D1 + D2):(1 + time + ...)
没有像我 liked/expected 那样工作。)
首先,模型矩阵具有正确的维度。盯着模型矩阵的列名并手动对列重新排序:
X2o <- X2[,c(1:3,6,4,7,5,8,9,11,10,12)]
all.equal(c(X1),c(X2o)) ##TRUE
(对于数值预测变量,您不需要 I(A*B)
:A:B
是等效的。)
实际上,使用 *
运算符
form3 <- ~0 +
D1 + D2 +
(D1+D2):(time*(schcontext+female))
X3 <- model.matrix(form3, data=dat)
X3o <- X3[,c(1:3,6,4,7,5,8,10,12,9,11)]
all.equal(c(X1),c(X3o)) ## TRUE
比较公式长度:
sapply(list(form1,form2,form3),
function(x) nchar(as.character(x)[[2]]))
## [1] 183 84 54