lm 的自定义打印 factors/contrasts
Custom print factors/contrasts for lm
我正在寻找一种在打印前编辑 lm
和 summary.lm
处理因子变量的方法。
例如,我希望系数像 Month: 6
而不是 Month6
- 因此在变量名称和因子水平之间添加额外的 space 和 :
。我不想将因子变量拆分为单独的列 - 如 model.matrix
.
最好是在评估 lm 之前以及在此之后和 summary.lm
之前进行评估。
示例:
> aa = airquality
> aa$Month = as.factor(aa$Month)
> # possible action
> ll = lm(Ozone~Month, aa)
> # possible action
> ss = summary(ll)
> ss
Call:
lm(formula = Ozone ~ Month, data = aa)
Residuals:
Min 1Q Median 3Q Max
-52.115 -16.823 -7.282 13.125 108.038
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.615 5.759 4.101 7.87e-05 ***
Month6 5.829 11.356 0.513 0.609
Month7 35.500 8.144 4.359 2.93e-05 ***
Month8 36.346 8.144 4.463 1.95e-05 ***
Month9 7.833 7.931 0.988 0.325
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29.36 on 111 degrees of freedom
(37 observations deleted due to missingness)
Multiple R-squared: 0.2352, Adjusted R-squared: 0.2077
F-statistic: 8.536 on 4 and 111 DF, p-value: 4.827e-06
您可以覆盖 ll
对象的 coefficients
成员的 names
属性:
names(ll$coefficients) <- gsub("^(.*)(\d)$", "\1: \2", names(ll$coefficients))
这意味着您得到:
ll
#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#>
#> Coefficients:
#> (Intercept) Month: 6 Month: 7 Month: 8 Month: 9
#> 23.615 5.829 35.500 36.346 7.833
和
summary(ll)
#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -52.115 -16.823 -7.282 13.125 108.038
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 23.615 5.759 4.101 7.87e-05 ***
#> Month: 6 5.829 11.356 0.513 0.609
#> Month: 7 35.500 8.144 4.359 2.93e-05 ***
#> Month: 8 36.346 8.144 4.463 1.95e-05 ***
#> Month: 9 7.833 7.931 0.988 0.325
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 29.36 on 111 degrees of freedom
#> (37 observations deleted due to missingness)
#> Multiple R-squared: 0.2352, Adjusted R-squared: 0.2077
#> F-statistic: 8.536 on 4 and 111 DF, p-value: 4.827e-06
对于更通用的解决方案,我们可以 gsub
我们在 contrasts
中找到的名称与我们在 coefficients
中找到的名称(根据 OP 的建议进行修改):
iris2 <- iris
iris2$Species2 <- sample(unique(iris$Species), nrow(iris2), TRUE)
ll <- lm(Sepal.Length ~ Species, iris2)
for(i in names(ll$contrasts)) {
alts <- paste0(levels(iris2[[i]]), collapse = "|")
names(ll$coefficients) <- gsub(glue::glue("^({i})((?:{alts}))$"),
paste0("\1", ": ", "\2"), names(ll$coefficients))
}
summary(ll)
#>
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width + Species + Species2,
#> data = iris2)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.27817 -0.25460 -0.06713 0.21136 1.44653
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.37909 0.37679 6.314 3.17e-09 ***
#> Sepal.Width 0.75297 0.11084 6.793 2.68e-10 ***
#> Species: versicolor 1.42257 0.11514 12.355 < 2e-16 ***
#> Species: virginica 1.93784 0.10043 19.296 < 2e-16 ***
#> Species2: versicolor 0.11452 0.08981 1.275 0.204
#> Species2: virginica -0.02119 0.09440 -0.224 0.823
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.4368 on 144 degrees of freedom
#> Multiple R-squared: 0.7311, Adjusted R-squared: 0.7217
#> F-statistic: 78.3 on 5 and 144 DF, p-value: < 2.2e-16
由 reprex package (v0.3.0)
于 2020-11-30 创建
我正在寻找一种在打印前编辑 lm
和 summary.lm
处理因子变量的方法。
例如,我希望系数像 Month: 6
而不是 Month6
- 因此在变量名称和因子水平之间添加额外的 space 和 :
。我不想将因子变量拆分为单独的列 - 如 model.matrix
.
最好是在评估 lm 之前以及在此之后和 summary.lm
之前进行评估。
示例:
> aa = airquality
> aa$Month = as.factor(aa$Month)
> # possible action
> ll = lm(Ozone~Month, aa)
> # possible action
> ss = summary(ll)
> ss
Call:
lm(formula = Ozone ~ Month, data = aa)
Residuals:
Min 1Q Median 3Q Max
-52.115 -16.823 -7.282 13.125 108.038
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.615 5.759 4.101 7.87e-05 ***
Month6 5.829 11.356 0.513 0.609
Month7 35.500 8.144 4.359 2.93e-05 ***
Month8 36.346 8.144 4.463 1.95e-05 ***
Month9 7.833 7.931 0.988 0.325
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 29.36 on 111 degrees of freedom
(37 observations deleted due to missingness)
Multiple R-squared: 0.2352, Adjusted R-squared: 0.2077
F-statistic: 8.536 on 4 and 111 DF, p-value: 4.827e-06
您可以覆盖 ll
对象的 coefficients
成员的 names
属性:
names(ll$coefficients) <- gsub("^(.*)(\d)$", "\1: \2", names(ll$coefficients))
这意味着您得到:
ll
#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#>
#> Coefficients:
#> (Intercept) Month: 6 Month: 7 Month: 8 Month: 9
#> 23.615 5.829 35.500 36.346 7.833
和
summary(ll)
#> Call:
#> lm(formula = Ozone ~ Month, data = aa)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -52.115 -16.823 -7.282 13.125 108.038
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 23.615 5.759 4.101 7.87e-05 ***
#> Month: 6 5.829 11.356 0.513 0.609
#> Month: 7 35.500 8.144 4.359 2.93e-05 ***
#> Month: 8 36.346 8.144 4.463 1.95e-05 ***
#> Month: 9 7.833 7.931 0.988 0.325
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 29.36 on 111 degrees of freedom
#> (37 observations deleted due to missingness)
#> Multiple R-squared: 0.2352, Adjusted R-squared: 0.2077
#> F-statistic: 8.536 on 4 and 111 DF, p-value: 4.827e-06
对于更通用的解决方案,我们可以 gsub
我们在 contrasts
中找到的名称与我们在 coefficients
中找到的名称(根据 OP 的建议进行修改):
iris2 <- iris
iris2$Species2 <- sample(unique(iris$Species), nrow(iris2), TRUE)
ll <- lm(Sepal.Length ~ Species, iris2)
for(i in names(ll$contrasts)) {
alts <- paste0(levels(iris2[[i]]), collapse = "|")
names(ll$coefficients) <- gsub(glue::glue("^({i})((?:{alts}))$"),
paste0("\1", ": ", "\2"), names(ll$coefficients))
}
summary(ll)
#>
#> Call:
#> lm(formula = Sepal.Length ~ Sepal.Width + Species + Species2,
#> data = iris2)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -1.27817 -0.25460 -0.06713 0.21136 1.44653
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 2.37909 0.37679 6.314 3.17e-09 ***
#> Sepal.Width 0.75297 0.11084 6.793 2.68e-10 ***
#> Species: versicolor 1.42257 0.11514 12.355 < 2e-16 ***
#> Species: virginica 1.93784 0.10043 19.296 < 2e-16 ***
#> Species2: versicolor 0.11452 0.08981 1.275 0.204
#> Species2: virginica -0.02119 0.09440 -0.224 0.823
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Residual standard error: 0.4368 on 144 degrees of freedom
#> Multiple R-squared: 0.7311, Adjusted R-squared: 0.7217
#> F-statistic: 78.3 on 5 and 144 DF, p-value: < 2.2e-16
由 reprex package (v0.3.0)
于 2020-11-30 创建