逻辑回归的 glm() 结果
Result of glm() for logistic regression
这可能是一个微不足道的问题,但我不知道在哪里可以找到答案。我想知道在 R 中使用 glm()
进行逻辑回归时,如果响应变量 Y
的因子值为 1 或 2,glm()
的结果是否对应于 logit(P(Y=1))
或logit(P(Y=2))
?如果 Y
具有逻辑值 TRUE
或 FALSE
怎么办?
为什么不自己测试一下呢?
output_bool <- c(rep(c(TRUE, FALSE), c(25, 75)), rep(c(TRUE, FALSE), c(75, 25)))
output_num <- c(rep(c(2, 1), c(25, 75)), rep(c(2, 1), c(75, 25)))
output_fact <- factor(output_num)
var <- rep(c("unlikely", "likely"), each = 100)
glm(output_bool ~ var, binomial)
#>
#> Call: glm(formula = output_bool ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
glm(output_num ~ var, binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
所以,如果我们使用 TRUE 和 FALSE,我们会得到正确的答案,如果我们使用 1 和 2 作为数字,我们会得到错误的答案,如果我们使用 1 和 2 作为具有两个水平的因子,我们会得到正确的结果,提供 TRUE 值具有比 FALSE 更高的因子水平。然而,我们必须小心我们的因素是如何排序的,否则我们会得到错误的结果:
output_fact <- factor(output_fact, levels = c("2", "1"))
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> -1.099 2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
(注意截距和系数有翻转的符号)
由 reprex package (v0.3.0)
创建于 2020-06-21
测试很好。如果您需要文档,它位于 ?binomial
(与 ?family
相同):
For the ‘binomial’ and ‘quasibinomial’ families the response can
be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not
having the first level (and hence usually of having the
second level).
- As a numerical vector with values between ‘0’ and ‘1’,
interpreted as the proportion of successful cases (with the
total number of cases given by the ‘weights’).
- As a two-column integer matrix: the first column gives the
number of successes and the second the number of failures.
它没有明确说明在逻辑 (TRUE
/FALSE
) 情况下会发生什么;为此,您必须知道,在将逻辑值强制转换为数值时,FALSE
→ 0 和 TRUE
→ 1.
这可能是一个微不足道的问题,但我不知道在哪里可以找到答案。我想知道在 R 中使用 glm()
进行逻辑回归时,如果响应变量 Y
的因子值为 1 或 2,glm()
的结果是否对应于 logit(P(Y=1))
或logit(P(Y=2))
?如果 Y
具有逻辑值 TRUE
或 FALSE
怎么办?
为什么不自己测试一下呢?
output_bool <- c(rep(c(TRUE, FALSE), c(25, 75)), rep(c(TRUE, FALSE), c(75, 25)))
output_num <- c(rep(c(2, 1), c(25, 75)), rep(c(2, 1), c(75, 25)))
output_fact <- factor(output_num)
var <- rep(c("unlikely", "likely"), each = 100)
glm(output_bool ~ var, binomial)
#>
#> Call: glm(formula = output_bool ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
glm(output_num ~ var, binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> 1.099 -2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
所以,如果我们使用 TRUE 和 FALSE,我们会得到正确的答案,如果我们使用 1 和 2 作为数字,我们会得到错误的答案,如果我们使用 1 和 2 作为具有两个水平的因子,我们会得到正确的结果,提供 TRUE 值具有比 FALSE 更高的因子水平。然而,我们必须小心我们的因素是如何排序的,否则我们会得到错误的结果:
output_fact <- factor(output_fact, levels = c("2", "1"))
glm(output_fact ~ var, binomial)
#>
#> Call: glm(formula = output_fact ~ var, family = binomial)
#>
#> Coefficients:
#> (Intercept) varunlikely
#> -1.099 2.197
#>
#> Degrees of Freedom: 199 Total (i.e. Null); 198 Residual
#> Null Deviance: 277.3
#> Residual Deviance: 224.9 AIC: 228.9
(注意截距和系数有翻转的符号)
由 reprex package (v0.3.0)
创建于 2020-06-21测试很好。如果您需要文档,它位于 ?binomial
(与 ?family
相同):
For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:
- As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
- As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
- As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.
它没有明确说明在逻辑 (TRUE
/FALSE
) 情况下会发生什么;为此,您必须知道,在将逻辑值强制转换为数值时,FALSE
→ 0 和 TRUE
→ 1.