逻辑回归的 glm() 结果

Result of glm() for logistic regression

这可能是一个微不足道的问题,但我不知道在哪里可以找到答案。我想知道在 R 中使用 glm() 进行逻辑回归时,如果响应变量 Y 的因子值为 1 或 2,glm() 的结果是否对应于 logit(P(Y=1))logit(P(Y=2))?如果 Y 具有逻辑值 TRUEFALSE 怎么办?

为什么不自己测试一下呢?

output_bool <- c(rep(c(TRUE, FALSE), c(25, 75)), rep(c(TRUE, FALSE), c(75, 25)))
output_num <- c(rep(c(2, 1), c(25, 75)), rep(c(2, 1), c(75, 25)))
output_fact <- factor(output_num)
var <- rep(c("unlikely", "likely"), each = 100)

glm(output_bool ~ var, binomial)
#> 
#> Call:  glm(formula = output_bool ~ var, family = binomial)
#> 
#> Coefficients:
#> (Intercept)  varunlikely  
#>       1.099       -2.197  
#> 
#> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
#> Null Deviance:       277.3 
#> Residual Deviance: 224.9     AIC: 228.9
glm(output_num ~ var, binomial)
#> Error in eval(family$initialize): y values must be 0 <= y <= 1
glm(output_fact ~ var, binomial)
#> 
#> Call:  glm(formula = output_fact ~ var, family = binomial)
#> 
#> Coefficients:
#> (Intercept)  varunlikely  
#>       1.099       -2.197  
#> 
#> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
#> Null Deviance:       277.3 
#> Residual Deviance: 224.9     AIC: 228.9

所以,如果我们使用 TRUE 和 FALSE,我们会得到正确的答案,如果我们使用 1 和 2 作为数字,我们会得到错误的答案,如果我们使用 1 和 2 作为具有两个水平的因子,我们会得到正确的结果,提供 TRUE 值具有比 FALSE 更高的因子水平。然而,我们必须小心我们的因素是如何排序的,否则我们会得到错误的结果:

output_fact <- factor(output_fact, levels = c("2", "1"))
glm(output_fact ~ var, binomial)
#> 
#> Call:  glm(formula = output_fact ~ var, family = binomial)
#> 
#> Coefficients:
#> (Intercept)  varunlikely  
#>      -1.099        2.197  
#> 
#> Degrees of Freedom: 199 Total (i.e. Null);  198 Residual
#> Null Deviance:       277.3 
#> Residual Deviance: 224.9     AIC: 228.9

(注意截距和系数有翻转的符号)

reprex package (v0.3.0)

创建于 2020-06-21

测试很好。如果您需要文档,它位于 ?binomial(与 ?family 相同):

For the ‘binomial’ and ‘quasibinomial’ families the response can be specified in one of three ways:

  1. As a factor: ‘success’ is interpreted as the factor not having the first level (and hence usually of having the second level).
  1. As a numerical vector with values between ‘0’ and ‘1’, interpreted as the proportion of successful cases (with the total number of cases given by the ‘weights’).
  1. As a two-column integer matrix: the first column gives the number of successes and the second the number of failures.

它没有明确说明在逻辑 (TRUE/FALSE) 情况下会发生什么;为此,您必须知道,在将逻辑值强制转换为数值时,FALSE → 0 和 TRUE → 1.