统计模型的漂亮总结
Pretty summaries for statistical models
我正在寻找一种在 R 中查看统计模型摘要的漂亮方法。在下面的示例中,我想查看 cyl_6 或 cyl.6 而不是 cyl6。我该怎么做?
library(dplyr)
library(broom)
mean_mpg <- mean(mtcars$mpg)
# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not
mtcars <-
mtcars %>%
mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))
mtcars$cyl <- as.factor(mtcars$cyl)
model <-
mtcars %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
tidy(model)
我能想到一种方法来做到这一点,但它很笨拙:在 运行型号:
mtcars$cyl <- as.factor(mtcars$cyl)
cont = contrasts(mtcars$cyl)
colnames(cont) = paste0("_", colnames(cont))
contrasts(mtcars$cyl) = cont
model <-
mtcars %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
tidy(model)
输出:
# A tibble: 5 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.9 24034. 0.000953 0.999
2 cyl_6 -22.4 12326. -0.00182 0.999
3 cyl_8 -44.5 23246. -0.00191 0.998
4 vs -1.59 13641. -0.000117 1.000
5 am 0.201 13641. 0.0000147 1.000
如果您默认需要这种行为,我想您可以编写 contr.treatment
的修改版本,根据需要设置列名,然后使用 options(contrasts = ...)
将其设置为默认值?我还没有测试过它是否有效。
只需使用 sub
,例如,在管道中。
我从简化模型代码开始。
model <-
mtcars %>%
mutate(mpg_cat = as.integer(mpg > mean(mpg)),
cyl = factor(cyl)) %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
现在是应用正则表达式的问题:
"^cyl"
匹配字符串开头的 "cyl"
。
管道将是
model %>%
tidy() %>%
mutate(term = sub("^cyl", "cyl_", term))
## A tibble: 5 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) 22.9 24034. 0.000953 0.999
#2 cyl_6 -22.4 12326. -0.00182 0.999
#3 cyl_8 -44.5 23246. -0.00191 0.998
#4 vs -1.59 13641. -0.000117 1.000
#5 am 0.201 13641. 0.0000147 1.000
我正在寻找一种在 R 中查看统计模型摘要的漂亮方法。在下面的示例中,我想查看 cyl_6 或 cyl.6 而不是 cyl6。我该怎么做?
library(dplyr)
library(broom)
mean_mpg <- mean(mtcars$mpg)
# creating a new variable that shows that Miles/(US) gallon is greater than the mean or not
mtcars <-
mtcars %>%
mutate(mpg_cat = ifelse(mpg > mean_mpg, 1,0))
mtcars$cyl <- as.factor(mtcars$cyl)
model <-
mtcars %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
tidy(model)
我能想到一种方法来做到这一点,但它很笨拙:在 运行型号:
mtcars$cyl <- as.factor(mtcars$cyl)
cont = contrasts(mtcars$cyl)
colnames(cont) = paste0("_", colnames(cont))
contrasts(mtcars$cyl) = cont
model <-
mtcars %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
tidy(model)
输出:
# A tibble: 5 x 5
term estimate std.error statistic p.value
<chr> <dbl> <dbl> <dbl> <dbl>
1 (Intercept) 22.9 24034. 0.000953 0.999
2 cyl_6 -22.4 12326. -0.00182 0.999
3 cyl_8 -44.5 23246. -0.00191 0.998
4 vs -1.59 13641. -0.000117 1.000
5 am 0.201 13641. 0.0000147 1.000
如果您默认需要这种行为,我想您可以编写 contr.treatment
的修改版本,根据需要设置列名,然后使用 options(contrasts = ...)
将其设置为默认值?我还没有测试过它是否有效。
只需使用 sub
,例如,在管道中。
我从简化模型代码开始。
model <-
mtcars %>%
mutate(mpg_cat = as.integer(mpg > mean(mpg)),
cyl = factor(cyl)) %>%
select (cyl,vs, am, mpg_cat) %>%
glm(formula = mpg_cat ~ .,
data = ., family = "binomial")
现在是应用正则表达式的问题:
"^cyl"
匹配字符串开头的"cyl"
。
管道将是
model %>%
tidy() %>%
mutate(term = sub("^cyl", "cyl_", term))
## A tibble: 5 x 5
# term estimate std.error statistic p.value
# <chr> <dbl> <dbl> <dbl> <dbl>
#1 (Intercept) 22.9 24034. 0.000953 0.999
#2 cyl_6 -22.4 12326. -0.00182 0.999
#3 cyl_8 -44.5 23246. -0.00191 0.998
#4 vs -1.59 13641. -0.000117 1.000
#5 am 0.201 13641. 0.0000147 1.000