R 的 lm 显示具有有序因子的奇怪行为

R's lm showing strange behaviour with ordered factors

当我将有序因子插入 lm 函数时,结果出乎我的意料。

当然有一个很好的解释...

# Generate some data
# parameters
n = 20L
set.seed(11L)

# Ordered factor
t <- factor(sample(c(1L, 2L), size = n, replace = TRUE),
       label = c("Low", "High"), 
       ordered = TRUE)
t
 [1] Low  Low  High Low  Low  High Low  Low  High Low  Low  Low  High
[14] High High High Low  Low  Low  Low 
Levels: Low < High

# not ordered factor, keep reference level as High
tno <- factor(t , ordered = FALSE)
tno <- relevel(tno, ref = "High")
tno
 [1] Low  Low  High Low  Low  High Low  Low  High Low  Low  Low  High
[14] High High High Low  Low  Low  Low 
Levels: High Low

# A simple indicator variable
ti <- t == "Low"
ti
 [1]  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE FALSE  TRUE  TRUE
[12]  TRUE FALSE FALSE FALSE FALSE  TRUE  TRUE  TRUE  TRUE

# Some dependent variable
y <- 10*rnorm(n)

# Run three regression
# Observe ordered factor is not giving the correct results
lm(y ~ t)
Call:
lm(formula = y ~ t)

Coefficients:
(Intercept)          t.L  
    -3.6082       0.8038

lm(y ~ tno)
Call:
lm(formula = y ~ tno)

Coefficients:
(Intercept)       tnoLow  
     -3.040       -1.137 

lm(y ~ ti)
Call:
lm(formula = y ~ ti)

Coefficients:
(Intercept)       tiTRUE  
     -3.040       -1.137  


# Confirm correct intercept
mean(y[t == "High"])
[1] -3.039771
# Just rounding difference...

试试运行宁这个

rest<- lm(y ~ t)
restno <- lm(y ~ tno)
resti <- lm(y ~ ti)

rest$fitted.values
restno$fitted.values
resti$fitted.values

rest$xlevels
restno$xlevels
resti$xlevels

rest$contrasts
restno$contrasts
resti$contrasts

首先,您将看到所有三个模型的拟合值完全相同。所以 ordered 的模型不是 "wrong."

其次,你会看到层次不一样。事实上,只有 tno 有层次。其他人没有,因为您将它们视为数字,您可以这样做,因为这是一个二分变量。您还会看到该因子是一个字符串,而其他两个不是。

第三,你会看到 tno 和 ti 使用 "contr.treatment" 而 t 使用 "contr.poly" 这对于序数变量是有意义的。

如果你运行这个

restno_poly<- lm(y ~ tno, contrasts = list(tno = "contr.poly"))
restno_poly

你会得到

Call:
lm(formula = y ~ tno, contrasts = list(tno = "contr.poly"))

Coefficients:
(Intercept)        tno.L  
    -3.6082      -0.8038  

同样

rest_treatment<- lm(y ~ t, contrasts = list(t = "contr.treatment"))

同样给出了您期望的结果。

This page explains more.