如何对多列执行线性回归并获得数据框输出：回归方程和 r 平方值？

Question

我的数据框看起来像这样

df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100), 
                                                class = c("POSIXct", "POSIXt"), tzone = "EST"), 
                    X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39), 
                    X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26), 
                    X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26), 
                    LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)), 
               row.names = 2749:2754, class = "data.frame")

我正在尝试获取所有列的线性回归方程和 R 平方值，其中带有 string "Stationary" 的列将始终位于 x 轴上。

到目前为止，我可以针对 "stationary" 列

执行 1 列的线性回归


model = lm(df$LH27_20822244_U_Stationary ~
             df$X20822214_X4LH_S, df)

当我使用

summary(model)

之后它给了我一些我想要在数据框中的值（即 R squared、Estimate Std.、Std. Error、Pr(>|t|)），但有两件事我需要帮助有：

我仍然需要名称中没有 stationary 的每一列的回归方程
我需要这些名称中没有 stationary 的列的这些值，我需要它是一个看起来像这样的数据框...

 Logger_ID        Reg_equation R_Squared Estimate_Std. Std_Error  Pr_t..
  <chr>            <int>               <int>     <int>        <int>     <int>   
1 X20676887_X2LH_S NA                  NA        NA            NA         NA      
2 X20819831_11LH_S NA                  NA        NA            NA         NA      
3 X20822214_X4LH_S NA                  NA        NA            NA         NA

Answer 1

像这样：

library(tidyverse)
library(broom)
df1 %>% 
  pivot_longer(
    cols = starts_with("X")
  ) %>% 
  mutate(name = factor(name)) %>% 
  group_by(name) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(LH27_20822244_U_Stationary ~ value, data = df) %>% 
      glance() %>% 
      # tidy() %>%  
      add_column(name = unique(df$name), .before=1)
  })

使用tidy()

  name             term        estimate std.error statistic p.value
  <fct>            <chr>          <dbl>     <dbl>     <dbl>   <dbl>
1 X20676887_X2LH_S (Intercept)   12.8      2.28        5.62 0.00494
2 X20676887_X2LH_S value          0.393    0.0855      4.59 0.0101 
3 X20819831_11LH_S (Intercept)   10.4      3.72        2.79 0.0495 
4 X20819831_11LH_S value          0.492    0.142       3.47 0.0256 
5 X20822214_X4LH_S (Intercept)   10.5      3.30        3.20 0.0329 
6 X20822214_X4LH_S value          0.485    0.126       3.86 0.0182

使用glance()

  name          r.squared adj.r.squared  sigma statistic p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>             <dbl>         <dbl>  <dbl>     <dbl>   <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 X20676887_X2~     0.841         0.801 0.0350      21.1  0.0101     1   12.8 -19.6 -20.3  0.00490           4     6
2 X20819831_11~     0.751         0.688 0.0438      12.0  0.0256     1   11.5 -17.0 -17.6  0.00766           4     6
3 X20822214_X4~     0.788         0.735 0.0403      14.9  0.0182     1   12.0 -17.9 -18.6  0.00651           4     6

如何对多列执行线性回归并获得数据框输出：回归方程和 r 平方值？

How to perform linear regression for multiple columns and get a dataframe output with: regression equation and r squared value?

linear-equation

r

linear-regression