获取许多模型中特定变量的 p 值以及其他自变量的所有可能组合
Get p values for a specific variable in many models with all possible combinations of other independent variables
我正在尝试 运行 许多具有一组自变量的所有可能组合的回归模型。
在这个例子中,我对 cyl
的系数以及 xlist
中列出的其他变量的所有可能组合感兴趣。
df <- mtcars
md <- "mpg ~ cyl"
xlist <- c("disp", "hp", "am")
n <- length(xlist)
# get a list of all possible combinations of xlist
comb_lst <- unlist(lapply(1:n, function(x) combn(xlist, x, simplify=F)), recursive = F)
# get a list of all models
md_lst <- lapply(comb_lst, function(x) paste(md, "+", paste(x, collapse = "+")))
# run those models and obtain coefficients for cyl
coefs <- unlist(lapply(md_lst, function(x) lm(as.formula(x),data=df)$coe[2]))
获取 cyl
的所有系数效果很好。但是,我不知道如何获得与每个系数对应的 p 值。
pvalues <- lapply(md, function(x) lm(as.formula(x),data=df)$?[2]))
如有任何建议,我们将不胜感激。
Disclaimer:: 这个答案使用了 manymodelr 的开发者版本,我也碰巧写过。然后您可以继续只选择您感兴趣的那些变量。
Map(function(x) manymodelr::extract_model_info(x,"p_value"),
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
# Just cyl
Map(function(x) manymodelr::extract_model_info(x,"p_value")["cyl"],
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
如果您不想使用包:
Map(function(x) coef(summary(x))[,4]["cyl"],
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
结果::(第一部分)
[[1]]
(Intercept) cyl disp
4.022869e-14 3.366495e-02 5.418572e-02
[[2]]
(Intercept) cyl hp
1.620660e-16 4.803752e-04 2.125285e-01
[[3]]
(Intercept) cyl am
7.694408e-14 1.284560e-07 5.635445e-02
[[4]]
(Intercept) cyl disp hp
1.537198e-13 1.349044e-01 8.092901e-02 3.249519e-01
[[5]]
(Intercept) cyl disp am
2.026114e-12 2.823412e-02 1.544849e-01 1.610559e-01
[[6]]
(Intercept) cyl hp am
9.270924e-12 8.635578e-02 1.692706e-02 5.464020e-03
[[7]]
(Intercept) cyl disp hp am
3.724625e-11 2.800850e-01 4.760672e-01 4.416647e-02 2.520516e-02
使用no包的cyl
每个系数的简单方法:
pvalues <- lapply(md_lst, function(x) summary(lm(as.formula(x),data=df))$coefficients[,4])
[[1]]
(Intercept) cyl disp
4.022869e-14 3.366495e-02 5.418572e-02
[[2]]
(Intercept) cyl hp
1.620660e-16 4.803752e-04 2.125285e-01
[[3]]
(Intercept) cyl am
7.694408e-14 1.284560e-07 5.635445e-02
[[4]]
(Intercept) cyl disp hp
1.537198e-13 1.349044e-01 8.092901e-02 3.249519e-01
[[5]]
(Intercept) cyl disp am
2.026114e-12 2.823412e-02 1.544849e-01 1.610559e-01
[[6]]
(Intercept) cyl hp am
9.270924e-12 8.635578e-02 1.692706e-02 5.464020e-03
[[7]]
(Intercept) cyl disp hp am
3.724625e-11 2.800850e-01 4.760672e-01 4.416647e-02 2.520516e-02
全部使用broom::glance
:
pvalues <- lapply(md_lst, function(x) glance(summary(lm(as.formula(x),data=df)))$p.value)
[[1]]
[1] 1.057904e-09
[[2]]
[1] 3.161781e-09
[[3]]
[1] 1.093687e-09
[[4]]
[1] 5.053802e-09
[[5]]
[1] 3.060153e-09
[[6]]
[1] 4.790959e-10
[[7]]
[1] 2.540038e-09
我正在尝试 运行 许多具有一组自变量的所有可能组合的回归模型。
在这个例子中,我对 cyl
的系数以及 xlist
中列出的其他变量的所有可能组合感兴趣。
df <- mtcars
md <- "mpg ~ cyl"
xlist <- c("disp", "hp", "am")
n <- length(xlist)
# get a list of all possible combinations of xlist
comb_lst <- unlist(lapply(1:n, function(x) combn(xlist, x, simplify=F)), recursive = F)
# get a list of all models
md_lst <- lapply(comb_lst, function(x) paste(md, "+", paste(x, collapse = "+")))
# run those models and obtain coefficients for cyl
coefs <- unlist(lapply(md_lst, function(x) lm(as.formula(x),data=df)$coe[2]))
获取 cyl
的所有系数效果很好。但是,我不知道如何获得与每个系数对应的 p 值。
pvalues <- lapply(md, function(x) lm(as.formula(x),data=df)$?[2]))
如有任何建议,我们将不胜感激。
Disclaimer:: 这个答案使用了 manymodelr 的开发者版本,我也碰巧写过。然后您可以继续只选择您感兴趣的那些变量。
Map(function(x) manymodelr::extract_model_info(x,"p_value"),
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
# Just cyl
Map(function(x) manymodelr::extract_model_info(x,"p_value")["cyl"],
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
如果您不想使用包:
Map(function(x) coef(summary(x))[,4]["cyl"],
lapply(md_lst, function(x) do.call(lm, list(formula = x, data=mtcars))))
结果::(第一部分)
[[1]]
(Intercept) cyl disp
4.022869e-14 3.366495e-02 5.418572e-02
[[2]]
(Intercept) cyl hp
1.620660e-16 4.803752e-04 2.125285e-01
[[3]]
(Intercept) cyl am
7.694408e-14 1.284560e-07 5.635445e-02
[[4]]
(Intercept) cyl disp hp
1.537198e-13 1.349044e-01 8.092901e-02 3.249519e-01
[[5]]
(Intercept) cyl disp am
2.026114e-12 2.823412e-02 1.544849e-01 1.610559e-01
[[6]]
(Intercept) cyl hp am
9.270924e-12 8.635578e-02 1.692706e-02 5.464020e-03
[[7]]
(Intercept) cyl disp hp am
3.724625e-11 2.800850e-01 4.760672e-01 4.416647e-02 2.520516e-02
使用no包的cyl
每个系数的简单方法:
pvalues <- lapply(md_lst, function(x) summary(lm(as.formula(x),data=df))$coefficients[,4])
[[1]]
(Intercept) cyl disp
4.022869e-14 3.366495e-02 5.418572e-02
[[2]]
(Intercept) cyl hp
1.620660e-16 4.803752e-04 2.125285e-01
[[3]]
(Intercept) cyl am
7.694408e-14 1.284560e-07 5.635445e-02
[[4]]
(Intercept) cyl disp hp
1.537198e-13 1.349044e-01 8.092901e-02 3.249519e-01
[[5]]
(Intercept) cyl disp am
2.026114e-12 2.823412e-02 1.544849e-01 1.610559e-01
[[6]]
(Intercept) cyl hp am
9.270924e-12 8.635578e-02 1.692706e-02 5.464020e-03
[[7]]
(Intercept) cyl disp hp am
3.724625e-11 2.800850e-01 4.760672e-01 4.416647e-02 2.520516e-02
全部使用broom::glance
:
pvalues <- lapply(md_lst, function(x) glance(summary(lm(as.formula(x),data=df)))$p.value)
[[1]]
[1] 1.057904e-09
[[2]]
[1] 3.161781e-09
[[3]]
[1] 1.093687e-09
[[4]]
[1] 5.053802e-09
[[5]]
[1] 3.060153e-09
[[6]]
[1] 4.790959e-10
[[7]]
[1] 2.540038e-09