R 的 lm 显示具有有序因子的奇怪行为
R's lm showing strange behaviour with ordered factors
当我将有序因子插入 lm
函数时,结果出乎我的意料。
当然有一个很好的解释...
# Generate some data
# parameters
n = 20L
set.seed(11L)
# Ordered factor
t <- factor(sample(c(1L, 2L), size = n, replace = TRUE),
label = c("Low", "High"),
ordered = TRUE)
t
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: Low < High
# not ordered factor, keep reference level as High
tno <- factor(t , ordered = FALSE)
tno <- relevel(tno, ref = "High")
tno
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: High Low
# A simple indicator variable
ti <- t == "Low"
ti
[1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[12] TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Some dependent variable
y <- 10*rnorm(n)
# Run three regression
# Observe ordered factor is not giving the correct results
lm(y ~ t)
Call:
lm(formula = y ~ t)
Coefficients:
(Intercept) t.L
-3.6082 0.8038
lm(y ~ tno)
Call:
lm(formula = y ~ tno)
Coefficients:
(Intercept) tnoLow
-3.040 -1.137
lm(y ~ ti)
Call:
lm(formula = y ~ ti)
Coefficients:
(Intercept) tiTRUE
-3.040 -1.137
# Confirm correct intercept
mean(y[t == "High"])
[1] -3.039771
# Just rounding difference...
试试运行宁这个
rest<- lm(y ~ t)
restno <- lm(y ~ tno)
resti <- lm(y ~ ti)
rest$fitted.values
restno$fitted.values
resti$fitted.values
rest$xlevels
restno$xlevels
resti$xlevels
rest$contrasts
restno$contrasts
resti$contrasts
首先,您将看到所有三个模型的拟合值完全相同。所以 ordered 的模型不是 "wrong."
其次,你会看到层次不一样。事实上,只有 tno 有层次。其他人没有,因为您将它们视为数字,您可以这样做,因为这是一个二分变量。您还会看到该因子是一个字符串,而其他两个不是。
第三,你会看到 tno 和 ti 使用 "contr.treatment" 而 t 使用 "contr.poly" 这对于序数变量是有意义的。
如果你运行这个
restno_poly<- lm(y ~ tno, contrasts = list(tno = "contr.poly"))
restno_poly
你会得到
Call:
lm(formula = y ~ tno, contrasts = list(tno = "contr.poly"))
Coefficients:
(Intercept) tno.L
-3.6082 -0.8038
同样
rest_treatment<- lm(y ~ t, contrasts = list(t = "contr.treatment"))
同样给出了您期望的结果。
当我将有序因子插入 lm
函数时,结果出乎我的意料。
当然有一个很好的解释...
# Generate some data
# parameters
n = 20L
set.seed(11L)
# Ordered factor
t <- factor(sample(c(1L, 2L), size = n, replace = TRUE),
label = c("Low", "High"),
ordered = TRUE)
t
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: Low < High
# not ordered factor, keep reference level as High
tno <- factor(t , ordered = FALSE)
tno <- relevel(tno, ref = "High")
tno
[1] Low Low High Low Low High Low Low High Low Low Low High
[14] High High High Low Low Low Low
Levels: High Low
# A simple indicator variable
ti <- t == "Low"
ti
[1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
[12] TRUE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE
# Some dependent variable
y <- 10*rnorm(n)
# Run three regression
# Observe ordered factor is not giving the correct results
lm(y ~ t)
Call:
lm(formula = y ~ t)
Coefficients:
(Intercept) t.L
-3.6082 0.8038
lm(y ~ tno)
Call:
lm(formula = y ~ tno)
Coefficients:
(Intercept) tnoLow
-3.040 -1.137
lm(y ~ ti)
Call:
lm(formula = y ~ ti)
Coefficients:
(Intercept) tiTRUE
-3.040 -1.137
# Confirm correct intercept
mean(y[t == "High"])
[1] -3.039771
# Just rounding difference...
试试运行宁这个
rest<- lm(y ~ t)
restno <- lm(y ~ tno)
resti <- lm(y ~ ti)
rest$fitted.values
restno$fitted.values
resti$fitted.values
rest$xlevels
restno$xlevels
resti$xlevels
rest$contrasts
restno$contrasts
resti$contrasts
首先,您将看到所有三个模型的拟合值完全相同。所以 ordered 的模型不是 "wrong."
其次,你会看到层次不一样。事实上,只有 tno 有层次。其他人没有,因为您将它们视为数字,您可以这样做,因为这是一个二分变量。您还会看到该因子是一个字符串,而其他两个不是。
第三,你会看到 tno 和 ti 使用 "contr.treatment" 而 t 使用 "contr.poly" 这对于序数变量是有意义的。
如果你运行这个
restno_poly<- lm(y ~ tno, contrasts = list(tno = "contr.poly"))
restno_poly
你会得到
Call:
lm(formula = y ~ tno, contrasts = list(tno = "contr.poly"))
Coefficients:
(Intercept) tno.L
-3.6082 -0.8038
同样
rest_treatment<- lm(y ~ t, contrasts = list(t = "contr.treatment"))
同样给出了您期望的结果。