交互项未显示在 glm 中
Interaction terms not showing up in glm
我对 glm 的一个问题感到困惑,其中相互作用项不会出现在估计结果中,除非它们作为单独的变量输入。我无法使用 mtcars 数据集重现问题,但我的数据没有任何异常。
下面的前三个模型给出了相同的结果,而第四个给出了预期的结果:
代码:
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
输出:
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"),
data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Error in terms.formula(formula, data = data) : invalid power in formula
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original + squared,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.839 -3.056 -2.288 2.755 8.401
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.31527 0.01689 18.669 < 2e-16 ***
wgrp1 -0.76865 0.01720 -44.702 < 2e-16 ***
pop1_3km_original -0.35047 0.05525 -6.344 2.24e-10 ***
squared -0.07031 0.03878 -1.813 0.0698 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29471 on 2324 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29641
Number of Fisher Scoring iterations: 5
这种交互实际上是一个多项式,*
在这里不起作用。您可能想要添加平方项,但使用 I()
函数来防止 glm
将其解释为算术运算。
glm(am ~ mpg*mpg, family=binomial(link='probit'), mtcars)$coe ## won't work
# (Intercept) mpg
# -3.9102065 0.1822684
glm(am ~ mpg + I(mpg^2), family=binomial(link='probit'), mtcars)$coe ## works
# (Intercept) mpg I(mpg^2)
# -1.504927683 -0.064500921 0.006066692
或使用poly(..., raw=TRUE)
glm(am ~ poly(mpg, 2, raw=TRUE), family=binomial(link='probit'), mtcars)$coe
# (Intercept) poly(mpg, 2, raw = TRUE)1 poly(mpg, 2, raw = TRUE)2
# -1.504927683 -0.064500921 0.006066692
或者预先计算一下。
glm(am ~ mpg + mpgsq, family=binomial(link='probit'), transform(mtcars, mpgsq=mpg^2))$coe
# (Intercept) mpg mpgsq
# -1.504927683 -0.064500921 0.006066692
我对 glm 的一个问题感到困惑,其中相互作用项不会出现在估计结果中,除非它们作为单独的变量输入。我无法使用 mtcars 数据集重现问题,但我的数据没有任何异常。
下面的前三个模型给出了相同的结果,而第四个给出了预期的结果: 代码:
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)
输出:
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"),
data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Error in terms.formula(formula, data = data) : invalid power in formula
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.784 -3.079 -2.257 2.729 8.409
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.33000 0.01479 22.31 <2e-16 ***
wgrp1 -0.77032 0.01717 -44.87 <2e-16 ***
pop1_3km_original -0.44447 0.01958 -22.70 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29474 on 2325 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29642
Number of Fisher Scoring iterations: 5
>
> df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
>
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)
Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original + squared,
family = binomial(link = "probit"), data = df, weights = indiv_weight)
Deviance Residuals:
Min 1Q Median 3Q Max
-5.839 -3.056 -2.288 2.755 8.401
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.31527 0.01689 18.669 < 2e-16 ***
wgrp1 -0.76865 0.01720 -44.702 < 2e-16 ***
pop1_3km_original -0.35047 0.05525 -6.344 2.24e-10 ***
squared -0.07031 0.03878 -1.813 0.0698 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 31927 on 2327 degrees of freedom
Residual deviance: 29471 on 2324 degrees of freedom
(1239 observations deleted due to missingness)
AIC: 29641
Number of Fisher Scoring iterations: 5
这种交互实际上是一个多项式,*
在这里不起作用。您可能想要添加平方项,但使用 I()
函数来防止 glm
将其解释为算术运算。
glm(am ~ mpg*mpg, family=binomial(link='probit'), mtcars)$coe ## won't work
# (Intercept) mpg
# -3.9102065 0.1822684
glm(am ~ mpg + I(mpg^2), family=binomial(link='probit'), mtcars)$coe ## works
# (Intercept) mpg I(mpg^2)
# -1.504927683 -0.064500921 0.006066692
或使用poly(..., raw=TRUE)
glm(am ~ poly(mpg, 2, raw=TRUE), family=binomial(link='probit'), mtcars)$coe
# (Intercept) poly(mpg, 2, raw = TRUE)1 poly(mpg, 2, raw = TRUE)2
# -1.504927683 -0.064500921 0.006066692
或者预先计算一下。
glm(am ~ mpg + mpgsq, family=binomial(link='probit'), transform(mtcars, mpgsq=mpg^2))$coe
# (Intercept) mpg mpgsq
# -1.504927683 -0.064500921 0.006066692