交互项未显示在 glm 中

Interaction terms not showing up in glm

我对 glm 的一个问题感到困惑,其中相互作用项不会出现在估计结果中,除非它们作为单独的变量输入。我无法使用 mtcars 数据集重现问题,但我的数据没有任何异常。

下面的前三个模型给出了相同的结果,而第四个给出了预期的结果: 代码:

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)

probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
summary(probit_model2)

输出:

> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original, family = binomial(link = "probit"), 
    data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original*pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original**pop1_3km_original, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Error in terms.formula(formula, data = data) : invalid power in formula
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original * pop1_3km_original, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.784  -3.079  -2.257   2.729   8.409  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.33000    0.01479   22.31   <2e-16 ***
wgrp1             -0.77032    0.01717  -44.87   <2e-16 ***
pop1_3km_original -0.44447    0.01958  -22.70   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29474  on 2325  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29642

Number of Fisher Scoring iterations: 5

> 
> df <- df %>% mutate(squared = pop1_3km_original*pop1_3km_original, .after=pop1_3km_original)
> 
> probit_model2 <- glm(any_ics99 ~ wgrp1 + pop1_3km_original+ squared, family = binomial(link = "probit"), data = df, weights = indiv_weight)
Warning message:
In eval(family$initialize) : non-integer #successes in a binomial glm!
> summary(probit_model2)

Call:
glm(formula = any_ics99 ~ wgrp1 + pop1_3km_original + squared, 
    family = binomial(link = "probit"), data = df, weights = indiv_weight)

Deviance Residuals: 
   Min      1Q  Median      3Q     Max  
-5.839  -3.056  -2.288   2.755   8.401  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)        0.31527    0.01689  18.669  < 2e-16 ***
wgrp1             -0.76865    0.01720 -44.702  < 2e-16 ***
pop1_3km_original -0.35047    0.05525  -6.344 2.24e-10 ***
squared           -0.07031    0.03878  -1.813   0.0698 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 31927  on 2327  degrees of freedom
Residual deviance: 29471  on 2324  degrees of freedom
  (1239 observations deleted due to missingness)
AIC: 29641

Number of Fisher Scoring iterations: 5

这种交互实际上是一个多项式,*在这里不起作用。您可能想要添加平方项,但使用 I() 函数来防止 glm 将其解释为算术运算。

glm(am ~ mpg*mpg, family=binomial(link='probit'), mtcars)$coe  ## won't work
# (Intercept)         mpg 
#  -3.9102065   0.1822684 

glm(am ~ mpg + I(mpg^2), family=binomial(link='probit'), mtcars)$coe  ## works
#  (Intercept)          mpg     I(mpg^2) 
# -1.504927683 -0.064500921  0.006066692 

或使用poly(..., raw=TRUE)

glm(am ~ poly(mpg, 2, raw=TRUE), family=binomial(link='probit'), mtcars)$coe
 # (Intercept) poly(mpg, 2, raw = TRUE)1 poly(mpg, 2, raw = TRUE)2 
# -1.504927683              -0.064500921               0.006066692 

或者预先计算一下。

glm(am ~ mpg + mpgsq, family=binomial(link='probit'), transform(mtcars, mpgsq=mpg^2))$coe
#  (Intercept)          mpg        mpgsq 
# -1.504927683 -0.064500921  0.006066692