r lm 矢量化控制变量
r lm vectorised control variables
我经常不得不写出控制变量不变的长方程。
例如,hp
是我感兴趣的变量 (x
),它在模型之间变化,vs + am + gear + carb
是我的控制变量
lm(disp ~ hp + vs + am + gear + carb, mtcars)
然后我的 x
是 drat
然后 wt
但我的控制是一样的。
lm(disp ~ drat + vs + am + gear + carb, mtcars)
lm(disp ~ wt + vs + am + gear + carb, mtcars)
我会发现有时能够将方程简化为
非常有用
y = 'disp'
x = 'hp'
controls = 'vs + am + gear + carb'
lm(y ~ x + controls, mtcars)
知道如何实现吗?
下面的代码构建了一个字符串公式(对@ZheyuanLi 的评论进行了少量编辑)以提供给 lm
并且还使用了 purrr
中的 map
函数(a tidyverse
包)为 x
向量中的每个变量创建一个单独的模型。列表 models
的每个元素都包含模型对象,元素的名称是模型公式中使用的 x
的值。
library(tidyverse)
y = 'disp'
x = c('hp','wt')
controls=c("vs","am","gear","carb")
models = map(setNames(x,x),
~ lm(paste(y, paste(c(.x, controls), collapse="+"), sep="~"),
data=mtcars))
map(models, summary)
$hp
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-85.524 -19.153 1.109 14.957 115.804
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 261.9238 73.2477 3.576 0.0014 **
hp 1.2021 0.2453 4.900 4.38e-05 ***
vs -63.7135 26.5957 -2.396 0.0241 *
am -56.0468 30.7338 -1.824 0.0797 .
gear -31.6231 23.4816 -1.347 0.1897
carb -14.3237 10.1169 -1.416 0.1687
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 47.97 on 26 degrees of freedom
Multiple R-squared: 0.8743, Adjusted R-squared: 0.8502
F-statistic: 36.18 on 5 and 26 DF, p-value: 6.547e-11
$wt
Call:
lm(formula = paste(y, paste(c(.x, controls), collapse = "+"),
sep = "~"), data = mtcars)
Residuals:
Min 1Q Median 3Q Max
-74.153 -36.993 -2.097 30.616 102.331
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.875 108.220 0.267 0.79172
wt 88.577 18.810 4.709 7.25e-05 ***
vs -92.669 25.186 -3.679 0.00107 **
am -3.734 34.662 -0.108 0.91503
gear -4.688 25.271 -0.186 0.85427
carb -8.455 9.662 -0.875 0.38955
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 48.88 on 26 degrees of freedom
Multiple R-squared: 0.8695, Adjusted R-squared: 0.8445
F-statistic: 34.66 on 5 and 26 DF, p-value: 1.056e-10
我经常不得不写出控制变量不变的长方程。
例如,hp
是我感兴趣的变量 (x
),它在模型之间变化,vs + am + gear + carb
是我的控制变量
lm(disp ~ hp + vs + am + gear + carb, mtcars)
然后我的 x
是 drat
然后 wt
但我的控制是一样的。
lm(disp ~ drat + vs + am + gear + carb, mtcars)
lm(disp ~ wt + vs + am + gear + carb, mtcars)
我会发现有时能够将方程简化为
非常有用y = 'disp'
x = 'hp'
controls = 'vs + am + gear + carb'
lm(y ~ x + controls, mtcars)
知道如何实现吗?
下面的代码构建了一个字符串公式(对@ZheyuanLi 的评论进行了少量编辑)以提供给 lm
并且还使用了 purrr
中的 map
函数(a tidyverse
包)为 x
向量中的每个变量创建一个单独的模型。列表 models
的每个元素都包含模型对象,元素的名称是模型公式中使用的 x
的值。
library(tidyverse)
y = 'disp'
x = c('hp','wt')
controls=c("vs","am","gear","carb")
models = map(setNames(x,x),
~ lm(paste(y, paste(c(.x, controls), collapse="+"), sep="~"),
data=mtcars))
map(models, summary)
$hp Call: lm(formula = paste(y, paste(c(.x, controls), collapse = "+"), sep = "~"), data = mtcars) Residuals: Min 1Q Median 3Q Max -85.524 -19.153 1.109 14.957 115.804 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 261.9238 73.2477 3.576 0.0014 ** hp 1.2021 0.2453 4.900 4.38e-05 *** vs -63.7135 26.5957 -2.396 0.0241 * am -56.0468 30.7338 -1.824 0.0797 . gear -31.6231 23.4816 -1.347 0.1897 carb -14.3237 10.1169 -1.416 0.1687 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 47.97 on 26 degrees of freedom Multiple R-squared: 0.8743, Adjusted R-squared: 0.8502 F-statistic: 36.18 on 5 and 26 DF, p-value: 6.547e-11 $wt Call: lm(formula = paste(y, paste(c(.x, controls), collapse = "+"), sep = "~"), data = mtcars) Residuals: Min 1Q Median 3Q Max -74.153 -36.993 -2.097 30.616 102.331 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 28.875 108.220 0.267 0.79172 wt 88.577 18.810 4.709 7.25e-05 *** vs -92.669 25.186 -3.679 0.00107 ** am -3.734 34.662 -0.108 0.91503 gear -4.688 25.271 -0.186 0.85427 carb -8.455 9.662 -0.875 0.38955 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 48.88 on 26 degrees of freedom Multiple R-squared: 0.8695, Adjusted R-squared: 0.8445 F-statistic: 34.66 on 5 and 26 DF, p-value: 1.056e-10