lm 的自定义打印 factors/contrasts

Custom print factors/contrasts for lm

我正在寻找一种在打印前编辑 lmsummary.lm 处理因子变量的方法。 例如,我希望系数像 Month: 6 而不是 Month6 - 因此在变量名称和因子水平之间添加额外的 space 和 :。我不想将因子变量拆分为单独的列 - 如 model.matrix.

最好是在评估 lm 之前以及在此之后和 summary.lm 之前进行评估。

示例:

> aa = airquality
> aa$Month = as.factor(aa$Month)
> # possible action
> ll = lm(Ozone~Month, aa)
> # possible action
> ss = summary(ll)
> ss

Call:
lm(formula = Ozone ~ Month, data = aa)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.115 -16.823  -7.282  13.125 108.038 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)   23.615      5.759   4.101 7.87e-05 ***
Month6         5.829     11.356   0.513    0.609    
Month7        35.500      8.144   4.359 2.93e-05 ***
Month8        36.346      8.144   4.463 1.95e-05 ***
Month9         7.833      7.931   0.988    0.325    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 29.36 on 111 degrees of freedom
  (37 observations deleted due to missingness)
Multiple R-squared:  0.2352,    Adjusted R-squared:  0.2077 
F-statistic: 8.536 on 4 and 111 DF,  p-value: 4.827e-06

您可以覆盖 ll 对象的 coefficients 成员的 names 属性:

names(ll$coefficients) <- gsub("^(.*)(\d)$", "\1: \2", names(ll$coefficients))

这意味着您得到:

ll

#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#> 
#> Coefficients:
#> (Intercept)     Month: 6     Month: 7     Month: 8     Month: 9  
#>      23.615        5.829       35.500       36.346        7.833  

summary(ll)
 
#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -52.115 -16.823  -7.282  13.125 108.038 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   23.615      5.759   4.101 7.87e-05 ***
#> Month: 6       5.829     11.356   0.513    0.609    
#> Month: 7      35.500      8.144   4.359 2.93e-05 ***
#> Month: 8      36.346      8.144   4.463 1.95e-05 ***
#> Month: 9       7.833      7.931   0.988    0.325    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 29.36 on 111 degrees of freedom
#>   (37 observations deleted due to missingness)
#> Multiple R-squared:  0.2352, Adjusted R-squared:  0.2077 
#> F-statistic: 8.536 on 4 and 111 DF,  p-value: 4.827e-06

对于更通用的解决方案,我们可以 gsub 我们在 contrasts 中找到的名称与我们在 coefficients 中找到的名称(根据 OP 的建议进行修改):

iris2 <- iris 
iris2$Species2 <- sample(unique(iris$Species), nrow(iris2), TRUE)
ll <- lm(Sepal.Length ~ Species, iris2)

for(i in names(ll$contrasts)) {   
  alts <- paste0(levels(iris2[[i]]), collapse = "|")   
  names(ll$coefficients) <- gsub(glue::glue("^({i})((?:{alts}))$"), 
                              paste0("\1", ": ", "\2"), names(ll$coefficients)) 
  }

summary(ll)
#> 
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width + Species + Species2, 
#>     data = iris2)
#> 
#> Residuals:
#>      Min       1Q   Median       3Q      Max 
#> -1.27817 -0.25460 -0.06713  0.21136  1.44653 
#> 
#> Coefficients:
#>                      Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)           2.37909    0.37679   6.314 3.17e-09 ***
#> Sepal.Width           0.75297    0.11084   6.793 2.68e-10 ***
#> Species: versicolor   1.42257    0.11514  12.355  < 2e-16 ***
#> Species: virginica    1.93784    0.10043  19.296  < 2e-16 ***
#> Species2: versicolor  0.11452    0.08981   1.275    0.204    
#> Species2: virginica  -0.02119    0.09440  -0.224    0.823    
#> ---
#> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#> 
#> Residual standard error: 0.4368 on 144 degrees of freedom
#> Multiple R-squared:  0.7311, Adjusted R-squared:  0.7217 
#> F-statistic:  78.3 on 5 and 144 DF,  p-value: < 2.2e-16

reprex package (v0.3.0)

于 2020-11-30 创建