在 GLM 中,为什么即使给出了数据,某些系数仍为 NA?

In GLM, why are some coeeficients NA even when the data is given?

在下面的例子中

df <- data.frame(place = c("South","South","North"),
                 temperature = c(30,30,20),
                 outlookfine=c(TRUE,TRUE,FALSE)
                 )
glm.fit <- glm(outlookfine ~ .,df, family=binomial)

glm.fit

输出为

Call:  glm(formula = outlookfine ~ ., family = binomial, data = df)

Coefficients:
(Intercept)   placeSouth  temperature  
     -23.57        47.13           NA  

Degrees of Freedom: 2 Total (i.e. Null);  1 Residual
Null Deviance:      3.819 
Residual Deviance: 3.496e-10    AIC: 4

为什么温度不适用?

[更新]

我试验了更多数据

df <- data.frame(place = c("South","South","North","East","West"),
                 temperature = c(30,17,20,12,15),
                 outlookfine=c(TRUE,TRUE,FALSE,FALSE,TRUE)
                 )
glm.fit <- glm(outlookfine ~ .,df, family= binomial )
glm.fit

这次有输出

Call:  glm(formula = outlookfine ~ ., family = binomial, data = df)

Coefficients:
(Intercept)   placeNorth   placeSouth    placeWest  temperature  
 -2.457e+01   -7.094e-07    4.913e+01    4.913e+01    8.868e-08  

Degrees of Freedom: 4 Total (i.e. Null);  0 Residual
Null Deviance:      6.73 
Residual Deviance: 2.143e-10    AIC: 10

我认为是因为 placetemperature 高度相关。

如果您这样做,您将获得相同的 fitted(glm.fit)

glm.fit <- glm(outlookfine ~ place,df, family=binomial)

glm.fit <- glm(outlookfine ~ temperature, df, family=binomial)

另一个相关变量给出 NA 系数的例子。

df <- iris
df$SL <- df$Sepal.Length * 2 + 1
glm(Sepal.Width ~ Sepal.Length + SL, data  = df)
Call:  glm(formula = Sepal.Width ~ Sepal.Length + SL, data = df)

Coefficients:
 (Intercept)  Sepal.Length            SL  
     3.41895      -0.06188            NA  

Degrees of Freedom: 149 Total (i.e. Null);  148 Residual
Null Deviance:        28.31 
Residual Deviance: 27.92  AIC: 179.5