循环数据帧列表中列的 lm 模型并输出显示斜率和 p 值的数据帧
Looping lm models of column in a list of dataframes and outputting dataframes showing the slope and p values
我想循环 lm()
变量 i
(响应)的模型,其中包含按因子拆分的数据帧列表中的解释变量。最后,我想创建两个将显示 lm
系数的数据框:第一个将显示 slope
,第二个将显示 p.value
,其中响应变量在模型中测试为 cols 和因子水平在行中。
我设法 运行 并打印 lm
模型的 summary
的输出,但不确定如何创建适当的 slope
和 p.value
数据帧。
这是我所做的:
data (iris)
iris_split = split (iris,f=iris$Species) ### Split the data by factor "Species"
我想运行 lm 为以下每个变量建立模型
(为了问题而被视为答复)
Petal.Width
vars = as.vector (unique (colnames (subset (iris, select = -c(Species, Petal.Width )))))
#Output:
#> vars
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length"
iris_lm = for (i in vars) { # loop across vars
lm_summary = lapply (iris_split, FUN = function(x)
summary(lm (x[,i] ~ x[,"Petal.Width"]))) #Where (x) is levels of factors "Species"
print(i) # so I could see which variable is tested in the model
print(lm_summary)
}
如何创建 slop.df
和 p.val.df
?
他们需要看起来像这样:
#> slop.df
# Species Sepal.Length Sepal.Width Petal.Length
#1 setosa slope? slope? slope?
#2 versicolor slope? slope? slope?
#3 virginica slope? slope? slope?
需要显示实际坡度而不是 "slope?"
占位符,p.val.df
也是如此
来自 [tidyverse][1] 的软件包使这相当方便:
iris %>%
pivot_longer(-c(Species, Petal.Width),
names_to = 'variable',
values_to = 'value'
) %>%
group_by(Species, variable) %>%
## mind to return the model results as a list!
summarise(model_summary = list(summary(lm(Petal.Width ~ value)))) %>%
rowwise %>%
mutate(slope = model_summary$coefficients[2, 'Estimate'],
## p = model_summary$coefficients[2, 'Pr(>|t|)']
) %>%
ungroup %>%
pivot_wider(id_cols = Species,
names_from = 'variable',
values_from = 'slope')
我想循环 lm()
变量 i
(响应)的模型,其中包含按因子拆分的数据帧列表中的解释变量。最后,我想创建两个将显示 lm
系数的数据框:第一个将显示 slope
,第二个将显示 p.value
,其中响应变量在模型中测试为 cols 和因子水平在行中。
我设法 运行 并打印 lm
模型的 summary
的输出,但不确定如何创建适当的 slope
和 p.value
数据帧。
这是我所做的:
data (iris)
iris_split = split (iris,f=iris$Species) ### Split the data by factor "Species"
我想运行 lm 为以下每个变量建立模型
(为了问题而被视为答复)
Petal.Width
vars = as.vector (unique (colnames (subset (iris, select = -c(Species, Petal.Width )))))
#Output:
#> vars
#[1] "Sepal.Length" "Sepal.Width" "Petal.Length"
iris_lm = for (i in vars) { # loop across vars
lm_summary = lapply (iris_split, FUN = function(x)
summary(lm (x[,i] ~ x[,"Petal.Width"]))) #Where (x) is levels of factors "Species"
print(i) # so I could see which variable is tested in the model
print(lm_summary)
}
如何创建 slop.df
和 p.val.df
?
他们需要看起来像这样:
#> slop.df
# Species Sepal.Length Sepal.Width Petal.Length
#1 setosa slope? slope? slope?
#2 versicolor slope? slope? slope?
#3 virginica slope? slope? slope?
需要显示实际坡度而不是 "slope?"
占位符,p.val.df
来自 [tidyverse][1] 的软件包使这相当方便:
iris %>%
pivot_longer(-c(Species, Petal.Width),
names_to = 'variable',
values_to = 'value'
) %>%
group_by(Species, variable) %>%
## mind to return the model results as a list!
summarise(model_summary = list(summary(lm(Petal.Width ~ value)))) %>%
rowwise %>%
mutate(slope = model_summary$coefficients[2, 'Estimate'],
## p = model_summary$coefficients[2, 'Pr(>|t|)']
) %>%
ungroup %>%
pivot_wider(id_cols = Species,
names_from = 'variable',
values_from = 'slope')