在 GLM 中,为什么即使给出了数据,某些系数仍为 NA?
In GLM, why are some coeeficients NA even when the data is given?
在下面的例子中
df <- data.frame(place = c("South","South","North"),
temperature = c(30,30,20),
outlookfine=c(TRUE,TRUE,FALSE)
)
glm.fit <- glm(outlookfine ~ .,df, family=binomial)
glm.fit
输出为
Call: glm(formula = outlookfine ~ ., family = binomial, data = df)
Coefficients:
(Intercept) placeSouth temperature
-23.57 47.13 NA
Degrees of Freedom: 2 Total (i.e. Null); 1 Residual
Null Deviance: 3.819
Residual Deviance: 3.496e-10 AIC: 4
为什么温度不适用?
[更新]
我试验了更多数据
df <- data.frame(place = c("South","South","North","East","West"),
temperature = c(30,17,20,12,15),
outlookfine=c(TRUE,TRUE,FALSE,FALSE,TRUE)
)
glm.fit <- glm(outlookfine ~ .,df, family= binomial )
glm.fit
这次有输出
Call: glm(formula = outlookfine ~ ., family = binomial, data = df)
Coefficients:
(Intercept) placeNorth placeSouth placeWest temperature
-2.457e+01 -7.094e-07 4.913e+01 4.913e+01 8.868e-08
Degrees of Freedom: 4 Total (i.e. Null); 0 Residual
Null Deviance: 6.73
Residual Deviance: 2.143e-10 AIC: 10
我认为是因为 place
与 temperature
高度相关。
如果您这样做,您将获得相同的 fitted(glm.fit)
值
glm.fit <- glm(outlookfine ~ place,df, family=binomial)
或
glm.fit <- glm(outlookfine ~ temperature, df, family=binomial)
另一个相关变量给出 NA 系数的例子。
df <- iris
df$SL <- df$Sepal.Length * 2 + 1
glm(Sepal.Width ~ Sepal.Length + SL, data = df)
Call: glm(formula = Sepal.Width ~ Sepal.Length + SL, data = df)
Coefficients:
(Intercept) Sepal.Length SL
3.41895 -0.06188 NA
Degrees of Freedom: 149 Total (i.e. Null); 148 Residual
Null Deviance: 28.31
Residual Deviance: 27.92 AIC: 179.5
在下面的例子中
df <- data.frame(place = c("South","South","North"),
temperature = c(30,30,20),
outlookfine=c(TRUE,TRUE,FALSE)
)
glm.fit <- glm(outlookfine ~ .,df, family=binomial)
glm.fit
输出为
Call: glm(formula = outlookfine ~ ., family = binomial, data = df)
Coefficients:
(Intercept) placeSouth temperature
-23.57 47.13 NA
Degrees of Freedom: 2 Total (i.e. Null); 1 Residual
Null Deviance: 3.819
Residual Deviance: 3.496e-10 AIC: 4
为什么温度不适用?
[更新]
我试验了更多数据
df <- data.frame(place = c("South","South","North","East","West"),
temperature = c(30,17,20,12,15),
outlookfine=c(TRUE,TRUE,FALSE,FALSE,TRUE)
)
glm.fit <- glm(outlookfine ~ .,df, family= binomial )
glm.fit
这次有输出
Call: glm(formula = outlookfine ~ ., family = binomial, data = df)
Coefficients:
(Intercept) placeNorth placeSouth placeWest temperature
-2.457e+01 -7.094e-07 4.913e+01 4.913e+01 8.868e-08
Degrees of Freedom: 4 Total (i.e. Null); 0 Residual
Null Deviance: 6.73
Residual Deviance: 2.143e-10 AIC: 10
我认为是因为 place
与 temperature
高度相关。
如果您这样做,您将获得相同的 fitted(glm.fit)
值
glm.fit <- glm(outlookfine ~ place,df, family=binomial)
或
glm.fit <- glm(outlookfine ~ temperature, df, family=binomial)
另一个相关变量给出 NA 系数的例子。
df <- iris
df$SL <- df$Sepal.Length * 2 + 1
glm(Sepal.Width ~ Sepal.Length + SL, data = df)
Call: glm(formula = Sepal.Width ~ Sepal.Length + SL, data = df) Coefficients: (Intercept) Sepal.Length SL 3.41895 -0.06188 NA Degrees of Freedom: 149 Total (i.e. Null); 148 Residual Null Deviance: 28.31 Residual Deviance: 27.92 AIC: 179.5