r 运行 lm 选定变量
r run lm on selected variables
我有大约 26 个变量需要 运行 分析如下:
model1=lm(var1~condition*time,data=main_df)
如何避免写 26 次并轻松地对变量 1 - 26 进行相同的分析?
以下是使用正则子集推导不同公式的方法:
library(leaps)
data(swiss)
a <- regsubsets(Fertility ~., data=swiss, nbest=1000, method="exhaustive, intercept=F, really.big=T)
b <- summary(a)[[1]] # return matrix with different combinations
b[1:15, ]
Agriculture Examination Education Catholic Infant.Mortality
1 FALSE FALSE FALSE FALSE TRUE
1 TRUE FALSE FALSE FALSE FALSE
1 FALSE TRUE FALSE FALSE FALSE
1 FALSE FALSE FALSE TRUE FALSE
1 FALSE FALSE TRUE FALSE FALSE
2 FALSE FALSE TRUE FALSE TRUE
2 TRUE FALSE FALSE FALSE TRUE
2 FALSE TRUE FALSE FALSE TRUE
2 FALSE FALSE FALSE TRUE TRUE
2 TRUE TRUE FALSE FALSE FALSE
2 TRUE FALSE TRUE FALSE FALSE
2 FALSE TRUE FALSE TRUE FALSE
2 TRUE FALSE FALSE TRUE FALSE
2 FALSE TRUE TRUE FALSE FALSE
2 FALSE FALSE TRUE TRUE FALSE
forms <-lapply(1:nrow(b), function(x)as.formula(paste("Fertility ~", paste(names(which(b[x,])), collapse="+"))))
head(forms)
[[1]]
Fertility ~ Infant.Mortality
<environment: 0x00000000199a6af0>
[[2]]
Fertility ~ Agriculture
<environment: 0x00000000199aa5c8>
[[3]]
Fertility ~ Examination
<environment: 0x00000000199ad140>
[[4]]
Fertility ~ Catholic
<environment: 0x00000000199afcb8>
[[5]]
Fertility ~ Education
<environment: 0x00000000199b3790>
[[6]]
Fertility ~ Education + Infant.Mortality
<environment: 0x00000000199b6308>
要包含交互因素,首先创建看起来最不可能的回归公式,然后运行将其放入正则子集中:
library(dplyr) # loading this to make use of select & magrittr pipes
pred <- swiss %>% select(Agriculture:Infant.Mortality) %>% names
all.terms <- as.formula(paste("Fertility", paste(pred, collapse="*"), sep="~"))
mods<-regsubsets(all.terms, data=swiss, nbest=1000, method="exhaustive", intercept=F, really.big=T)
b <- summary(mods)[[1]]
str(b)
logi [1:6496, 1:31] FALSE FALSE TRUE FALSE FALSE FALSE ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6496] "1" "1" "1" "1" ...
..$ : chr [1:31] "Agriculture" "Examination" "Education" "Catholic" ...
# i.e. 6496 combinations based on 31 variables based from 5 fixed factors and different interaction combinations
我会使用一个列表来存储所有的 ouctomes 以便于进一步操作(总结,系数提取......)
要尝试的东西:
lapply(as.list(1:26),FUN=function(i){lm(as.formula(paste("var",i,"~condition*time",sep="")),data=main_df)})
我有大约 26 个变量需要 运行 分析如下:
model1=lm(var1~condition*time,data=main_df)
如何避免写 26 次并轻松地对变量 1 - 26 进行相同的分析?
以下是使用正则子集推导不同公式的方法:
library(leaps)
data(swiss)
a <- regsubsets(Fertility ~., data=swiss, nbest=1000, method="exhaustive, intercept=F, really.big=T)
b <- summary(a)[[1]] # return matrix with different combinations
b[1:15, ]
Agriculture Examination Education Catholic Infant.Mortality
1 FALSE FALSE FALSE FALSE TRUE
1 TRUE FALSE FALSE FALSE FALSE
1 FALSE TRUE FALSE FALSE FALSE
1 FALSE FALSE FALSE TRUE FALSE
1 FALSE FALSE TRUE FALSE FALSE
2 FALSE FALSE TRUE FALSE TRUE
2 TRUE FALSE FALSE FALSE TRUE
2 FALSE TRUE FALSE FALSE TRUE
2 FALSE FALSE FALSE TRUE TRUE
2 TRUE TRUE FALSE FALSE FALSE
2 TRUE FALSE TRUE FALSE FALSE
2 FALSE TRUE FALSE TRUE FALSE
2 TRUE FALSE FALSE TRUE FALSE
2 FALSE TRUE TRUE FALSE FALSE
2 FALSE FALSE TRUE TRUE FALSE
forms <-lapply(1:nrow(b), function(x)as.formula(paste("Fertility ~", paste(names(which(b[x,])), collapse="+"))))
head(forms)
[[1]]
Fertility ~ Infant.Mortality
<environment: 0x00000000199a6af0>
[[2]]
Fertility ~ Agriculture
<environment: 0x00000000199aa5c8>
[[3]]
Fertility ~ Examination
<environment: 0x00000000199ad140>
[[4]]
Fertility ~ Catholic
<environment: 0x00000000199afcb8>
[[5]]
Fertility ~ Education
<environment: 0x00000000199b3790>
[[6]]
Fertility ~ Education + Infant.Mortality
<environment: 0x00000000199b6308>
要包含交互因素,首先创建看起来最不可能的回归公式,然后运行将其放入正则子集中:
library(dplyr) # loading this to make use of select & magrittr pipes
pred <- swiss %>% select(Agriculture:Infant.Mortality) %>% names
all.terms <- as.formula(paste("Fertility", paste(pred, collapse="*"), sep="~"))
mods<-regsubsets(all.terms, data=swiss, nbest=1000, method="exhaustive", intercept=F, really.big=T)
b <- summary(mods)[[1]]
str(b)
logi [1:6496, 1:31] FALSE FALSE TRUE FALSE FALSE FALSE ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:6496] "1" "1" "1" "1" ...
..$ : chr [1:31] "Agriculture" "Examination" "Education" "Catholic" ...
# i.e. 6496 combinations based on 31 variables based from 5 fixed factors and different interaction combinations
我会使用一个列表来存储所有的 ouctomes 以便于进一步操作(总结,系数提取......) 要尝试的东西:
lapply(as.list(1:26),FUN=function(i){lm(as.formula(paste("var",i,"~condition*time",sep="")),data=main_df)})