用序数因子解释 lm 的输出

Question

我有类似下面的例子：

library(tidyverse)
library(yardstick)
#> For binary classification, the first factor level is assumed to be the event.
#> Use the argument `event_level = "second"` to alter this as needed.
#> 
#> Attaching package: 'yardstick'
#> The following object is masked from 'package:readr':
#> 
#>     spec
data <- tibble(y = c(rnorm(30), rnorm(30,0.5), rnorm(30,1)),
           x = c(rep("a", 30), rep("b", 30), rep("c", 30)),
           covar = rnorm(90,0.1)) %>%
    mutate(x = factor(x, levels = c("a", "b", "c"), ordered = TRUE))
lm(y ~ x + covar, data = data) %>%
    tidy()
#> # A tibble: 4 × 5
#>   term        estimate std.error statistic     p.value
#>   <chr>          <dbl>     <dbl>     <dbl>       <dbl>
#> 1 (Intercept)   0.584      0.101     5.79  0.000000114
#> 2 x.L           0.522      0.175     2.99  0.00369    
#> 3 x.Q          -0.108      0.176    -0.615 0.540      
#> 4 covar        -0.0128     0.102    -0.125 0.901

^{由 reprex package (v2.0.1)}

创建于 2022-03-09

我想知道 y 是否取决于 x，但我也想考虑协变量 covar。

如何解释 lm 模型的输出？ x.L 和 x.Q 是什么？我在函数的文档中找不到这个。

Answer 1

您已将 x 定义为 有序因子 。显然这不仅仅是一个分类变量，它是一个序数级别的变量。意思是有等级顺序的信息。

在这种序数情况下，lm 默认为多项式对比：它将检查线性 (L)、二次 (Q), cubic (C), 等等……效果。 lm 将适合“水平数减去 1”的多项式对比。在您的例子中，x 有 3 个级别，因此 x.L 和 x.Q 出现在输出中。

用序数因子解释 lm 的输出

Interpret output of lm with ordinal factor

r

linear-regression