循环数据帧列表中列的 lm 模型并输出显示斜率和 p 值的数据帧

Question

我想循环 lm() 变量 i（响应）的模型，其中包含按因子拆分的数据帧列表中的解释变量。最后，我想创建两个将显示 lm 系数的数据框：第一个将显示 slope，第二个将显示 p.value，其中响应变量在模型中测试为 cols 和因子水平在行中。

我设法运行并打印 lm 模型的 summary 的输出，但不确定如何创建适当的 slope 和 p.value数据帧。

这是我所做的：

data (iris)
iris_split = split (iris,f=iris$Species) ### Split the data by factor "Species"

我想运行 lm 为以下每个变量建立模型（为了问题而被视为答复） Petal.Width

vars = as.vector (unique (colnames (subset (iris, select = -c(Species, Petal.Width )))))

#Output:
#> vars
#[1] "Sepal.Length" "Sepal.Width"  "Petal.Length"

iris_lm = for (i in vars) { # loop across vars
  lm_summary = lapply (iris_split, FUN = function(x) 
                summary(lm (x[,i] ~ x[,"Petal.Width"]))) #Where (x) is levels of factors "Species"
                print(i) # so I could see which variable is tested in the model
                print(lm_summary)
}

如何创建 slop.df 和 p.val.df？他们需要看起来像这样：

#> slop.df
#     Species Sepal.Length Sepal.Width Petal.Length
#1     setosa       slope?      slope?       slope?
#2 versicolor       slope?      slope?       slope?
#3  virginica       slope?      slope?       slope?

需要显示实际坡度而不是 "slope?" 占位符，p.val.df

也是如此

Answer 1

来自 [tidyverse][1] 的软件包使这相当方便：

iris %>% 
    pivot_longer(-c(Species, Petal.Width),
                 names_to = 'variable',
                 values_to = 'value'
                 ) %>% 
    group_by(Species, variable) %>% 
    ## mind to return the model results as a list!
    summarise(model_summary = list(summary(lm(Petal.Width ~ value)))) %>% 
    rowwise %>%
    mutate(slope = model_summary$coefficients[2, 'Estimate'],
           ## p = model_summary$coefficients[2, 'Pr(>|t|)']
           ) %>%
    ungroup %>%
    pivot_wider(id_cols = Species,
                names_from = 'variable',
                values_from = 'slope')

循环数据帧列表中列的 lm 模型并输出显示斜率和 p 值的数据帧

Looping lm models of column in a list of dataframes and outputting dataframes showing the slope and p values

loops

r

lapply

dataframe

lm