具有交互作用的线性回归的聚合系数 [r]

Aggregate coeffcients from linear regression with interactions [r]

我正在尝试编写一个函数,该函数将采用线性回归模型、设计矩阵 X 并将 return 系数的边际效应。

边际效应在回归的汇总输出中给出。但是,如果我们的模型中存在交互作用,我们必须汇总系数。

我在R里准备了一个小例子:


library(tidyverse)


# Desing Matrix for regression with interaction ---------------------------

X <- model.matrix(hp ~ factor(gear) + disp + mpg + wt + wt:factor(gear), data = mtcars)
y <- mtcars$hp


lm(y ~ X - 1, data = mtcars) %>% 
  summary()


# Model Interpretation ----------------------------------------------------

# Our reference category is 'Gear3'.
# Meaning that X(Intercept) is an intercept for Gear3
# If I want an intercept for Gear4 and Gear5 coefs Gear4 and Gear5 are deviations from refrence cathegory
# So Intercept for Gear4 is: X(Intercept) + Xfactor(gear)4 = 166.0265 + 30.2619 = 196.2884
# Same Idea hold for Gear5: X(Intercept) + Xfactor(gear)5 = 166.0265 -58.2724 = 107.7541


# Now if we want to interpret wt as a marginal affect, again we need to take into account our created interactions.

# Xwt is marginal effect of wt for the regrence cathegory (Gear3)
# If we want marginal Affect for (Gear4 cars) we need: Xwt + Xfactor(gear)4:wt = -6.4985 -9.8580 = -16.3565
# SAme Idea for gear5: Xwt + Xfactor(gear)5:wt = -6.4985 + 49.6390 = 43.1405

I would like to write a function that would take model object and desing matrix and would return a dataframe with our aggregated marginal effects:

万一我想要一个函数来 return 一个数据框:

# X(Intercept)gear3: 166.0265
# X(Intercept)gear4: 196.2884
# X(Intercept)gear5:  107.7541
# 
# Xwtgear3:-6.4985
# Xwtgear4:-16.3565
# Xwtgear5:43.1405

对于用户而言,交互的边际效应已经 precalculated/aggregated。 到目前为止,我能够创建一些还不够概括的概念。

因此,我想询问如何使用 dplyrdatatable 编写函数的任何想法。

我不确定你是对输出感兴趣还是对自己做的方式感兴趣。

关于前者,您可以使用 {interactions} 包,它会给您想要的输出。关于后者,您可以查看 {interactions} 程序包如何在幕后计算此输出。

library(dplyr)
library(interactions)

mtcars2 <- mtcars %>% mutate(gear = factor(gear))
fit <- lm(hp ~ wt * gear + disp + mpg, data = mtcars2)
slopes <- sim_slopes(fit, pred = wt, modx = gear, centered = "none")
#> Warning: Johnson-Neyman intervals are not available for factor moderators.

slopes$ints
#>   Value of gear     Est.      S.E.       2.5%    97.5%   t val.          p
#> 1             5 107.7540 103.74667 -106.36856 321.8766 1.038626 0.30932987
#> 2             4 196.2884 100.91523  -11.99042 404.5672 1.945082 0.06357154
#> 3             3 166.0265  71.71678   18.01029 314.0426 2.315029 0.02947996

slopes$slopes
#>   Value of gear       Est.     S.E.      2.5%    97.5%     t val.         p
#> 1             5  43.140490 26.69021 -11.94539 98.22637  1.6163415 0.1190910
#> 2             4 -16.356572 20.22409 -58.09705 25.38390 -0.8087667 0.4265945
#> 3             3  -6.498535 15.37599 -38.23302 25.23595 -0.4226417 0.6763194

reprex package (v0.3.0)

于 2021-01-03 创建

如果您希望两个结果合而为一 data.frame 您可以轻松地将它们绑定在一起:

bind_rows(mutate(slopes$ints, term = "(Intercept)"),
          mutate(slopes$slopes, term = "(Interaction)")) %>% 
            select(term, "Value of gear", Est., p)

#>            term Value of gear       Est.          p
#> 1   (Intercept)             5 107.754033 0.30932987
#> 2   (Intercept)             4 196.288380 0.06357154
#> 3   (Intercept)             3 166.026451 0.02947996
#> 4 (Interaction)             5  43.140490 0.11909097
#> 5 (Interaction)             4 -16.356572 0.42659451
#> 6 (Interaction)             3  -6.498535 0.67631941