如何对多列执行线性回归并获得数据框输出:回归方程和 r 平方值?
How to perform linear regression for multiple columns and get a dataframe output with: regression equation and r squared value?
我的数据框看起来像这样
df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100),
class = c("POSIXct", "POSIXt"), tzone = "EST"),
X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39),
X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26),
X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26),
LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)),
row.names = 2749:2754, class = "data.frame")
我正在尝试获取所有列的线性回归方程和 R 平方值,其中带有 string
"Stationary"
的列将始终位于 x 轴上。
到目前为止,我可以针对 "stationary"
列
执行 1 列的线性回归
model = lm(df$LH27_20822244_U_Stationary ~
df$X20822214_X4LH_S, df)
当我使用
summary(model)
之后它给了我一些我想要在数据框中的值(即 R squared
、Estimate Std.
、Std. Error
、Pr(>|t|)
),但有两件事我需要帮助有:
- 我仍然需要名称中没有
stationary
的每一列的回归方程
- 我需要这些名称中没有
stationary
的列的这些值,我需要它是一个看起来像这样的数据框...
Logger_ID Reg_equation R_Squared Estimate_Std. Std_Error Pr_t..
<chr> <int> <int> <int> <int> <int>
1 X20676887_X2LH_S NA NA NA NA NA
2 X20819831_11LH_S NA NA NA NA NA
3 X20822214_X4LH_S NA NA NA NA NA
像这样:
library(tidyverse)
library(broom)
df1 %>%
pivot_longer(
cols = starts_with("X")
) %>%
mutate(name = factor(name)) %>%
group_by(name) %>%
group_split() %>%
map_dfr(.f = function(df){
lm(LH27_20822244_U_Stationary ~ value, data = df) %>%
glance() %>%
# tidy() %>%
add_column(name = unique(df$name), .before=1)
})
使用tidy()
name term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 X20676887_X2LH_S (Intercept) 12.8 2.28 5.62 0.00494
2 X20676887_X2LH_S value 0.393 0.0855 4.59 0.0101
3 X20819831_11LH_S (Intercept) 10.4 3.72 2.79 0.0495
4 X20819831_11LH_S value 0.492 0.142 3.47 0.0256
5 X20822214_X4LH_S (Intercept) 10.5 3.30 3.20 0.0329
6 X20822214_X4LH_S value 0.485 0.126 3.86 0.0182
使用glance()
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 X20676887_X2~ 0.841 0.801 0.0350 21.1 0.0101 1 12.8 -19.6 -20.3 0.00490 4 6
2 X20819831_11~ 0.751 0.688 0.0438 12.0 0.0256 1 11.5 -17.0 -17.6 0.00766 4 6
3 X20822214_X4~ 0.788 0.735 0.0403 14.9 0.0182 1 12.0 -17.9 -18.6 0.00651 4 6
我的数据框看起来像这样
df = structure(list(Date_Time_GMT_3 = structure(c(1625025600, 1625026500,1625027400, 1625028300, 1625029200, 1625030100),
class = c("POSIXct", "POSIXt"), tzone = "EST"),
X20676887_X2LH_S = c(26.879, 26.781,26.683, 26.585, 26.488, 26.39),
X20819831_11LH_S = c(26.39, 26.292, 26.195, 26.195, 26.097, 26),
X20822214_X4LH_S = c(26.39, 26.292,26.292, 26.195, 26.097, 26),
LH27_20822244_U_Stationary = c(23.388, 23.292, 23.292, 23.196, 23.196, 23.196)),
row.names = 2749:2754, class = "data.frame")
我正在尝试获取所有列的线性回归方程和 R 平方值,其中带有 string
"Stationary"
的列将始终位于 x 轴上。
到目前为止,我可以针对 "stationary"
列
model = lm(df$LH27_20822244_U_Stationary ~
df$X20822214_X4LH_S, df)
当我使用
summary(model)
之后它给了我一些我想要在数据框中的值(即 R squared
、Estimate Std.
、Std. Error
、Pr(>|t|)
),但有两件事我需要帮助有:
- 我仍然需要名称中没有
stationary
的每一列的回归方程 - 我需要这些名称中没有
stationary
的列的这些值,我需要它是一个看起来像这样的数据框...
Logger_ID Reg_equation R_Squared Estimate_Std. Std_Error Pr_t..
<chr> <int> <int> <int> <int> <int>
1 X20676887_X2LH_S NA NA NA NA NA
2 X20819831_11LH_S NA NA NA NA NA
3 X20822214_X4LH_S NA NA NA NA NA
像这样:
library(tidyverse)
library(broom)
df1 %>%
pivot_longer(
cols = starts_with("X")
) %>%
mutate(name = factor(name)) %>%
group_by(name) %>%
group_split() %>%
map_dfr(.f = function(df){
lm(LH27_20822244_U_Stationary ~ value, data = df) %>%
glance() %>%
# tidy() %>%
add_column(name = unique(df$name), .before=1)
})
使用tidy()
name term estimate std.error statistic p.value
<fct> <chr> <dbl> <dbl> <dbl> <dbl>
1 X20676887_X2LH_S (Intercept) 12.8 2.28 5.62 0.00494
2 X20676887_X2LH_S value 0.393 0.0855 4.59 0.0101
3 X20819831_11LH_S (Intercept) 10.4 3.72 2.79 0.0495
4 X20819831_11LH_S value 0.492 0.142 3.47 0.0256
5 X20822214_X4LH_S (Intercept) 10.5 3.30 3.20 0.0329
6 X20822214_X4LH_S value 0.485 0.126 3.86 0.0182
使用glance()
name r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual nobs
<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <int> <int>
1 X20676887_X2~ 0.841 0.801 0.0350 21.1 0.0101 1 12.8 -19.6 -20.3 0.00490 4 6
2 X20819831_11~ 0.751 0.688 0.0438 12.0 0.0256 1 11.5 -17.0 -17.6 0.00766 4 6
3 X20822214_X4~ 0.788 0.735 0.0403 14.9 0.0182 1 12.0 -17.9 -18.6 0.00651 4 6