将简单线性回归应用于 R 中的多个数据帧
Apply Simple Linear Regression to Multiple Data Frames in R
我有一个数据集,我将其拆分为多个数据框,并且需要对每个拆分出的数据框应用简单线性回归。我的代码如下:
library(dplyr)
library(readr)
library(magrittr)
library(lubridate)
library(stats)
c_data <- read_csv("D:/projects/sloper_tool/data_2013_to_2017.csv")
C_data_out <-
c_data %>%
group_by(SAMP_SITE_NAME, STD_CON_LONG_NAME, FILTERED_FLAG) %>%
mutate(MED_V = median(STD_VALUE_RPTD)) %>%
mutate(MIN_V = min(STD_VALUE_RPTD)) %>%
mutate(MAX_V = max(STD_VALUE_RPTD)) %>%
ungroup() %>%
select(SAMP_SITE_NAME, STD_CON_LONG_NAME, SAMP_DATE, STD_VALUE_RPTD, STD_ANAL_UNITS_RPTD, FILTERED_FLAG, LAB_QUALIFIER, MED_V, MIN_V, MAX_V) %>%
rename(Well = SAMP_SITE_NAME, Constit = STD_CON_LONG_NAME, Date = SAMP_DATE, Value = STD_VALUE_RPTD, Unit = STD_ANAL_UNITS_RPTD, Filtered = FILTERED_FLAG, Flag = LAB_QUALIFIER, Median = MED_V, Min = MIN_V, Max = MAX_V) %>%
mutate(Date = mdy(Date))
dfs <- split(C_data_out, with(C_data_out, interaction(Well, Constit, Filtered)), drop = TRUE)
dfs[2]
这从原始输入中分离出如下所示的数据帧:
$`299-E13-14.Gross alpha.N`
# A tibble: 4 x 10
Well Constit Date Value Unit Filtered Flag Median Min Max
<chr> <chr> <date> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 299-E13-14 Gross alpha 2014-04-11 3.40 pCi/L N <NA> 2.745 1.86 3.89
2 299-E13-14 Gross alpha 2015-04-08 2.09 pCi/L N <NA> 2.745 1.86 3.89
3 299-E13-14 Gross alpha 2016-04-25 3.89 pCi/L N <NA> 2.745 1.86 3.89
4 299-E13-14 Gross alpha 2017-04-06 1.86 pCi/L N <NA> 2.745 1.86 3.89
接下来我需要对每个拆分出的数据帧应用一个简单的线性回归模型。我尝试使用以下各种排列无济于事。
fit <-
dfs %>%
lm(Value ~ Date)
# Get slope by:
slope <- fit$coefficients[[2]]
slope
此输出给出:
fit <-
dfs %>%
lm(Value ~ Date, data = dfs)
Error in formula.default(object, env = baseenv()) : invalid formula
slope = fit$coefficients[[2]]
Error: object 'fit' not found
slope
(Intercept) Date
109778.966473 -5.093003
这似乎应用于整个原始数据集,而不是应用于单独的拆分数据帧。我想将单个数据帧的斜率输出到一个文件或更好的是将斜率作为矢量附加到 dfs 中的数据帧。
任何帮助将不胜感激!
这样的事情可能会奏效。不过我没有你的数据,所以无法测试。
# calculate the fit models per data frame
fits <- lapply( dfs, function(x) {
lm( formula = Value ~ Date, data = x )
} )
# extract the slope from all models
slopes <- sapply( fits, function(x) x$coefficients )
# print one of the results to see it
slopes[1]
我有一个数据集,我将其拆分为多个数据框,并且需要对每个拆分出的数据框应用简单线性回归。我的代码如下:
library(dplyr)
library(readr)
library(magrittr)
library(lubridate)
library(stats)
c_data <- read_csv("D:/projects/sloper_tool/data_2013_to_2017.csv")
C_data_out <-
c_data %>%
group_by(SAMP_SITE_NAME, STD_CON_LONG_NAME, FILTERED_FLAG) %>%
mutate(MED_V = median(STD_VALUE_RPTD)) %>%
mutate(MIN_V = min(STD_VALUE_RPTD)) %>%
mutate(MAX_V = max(STD_VALUE_RPTD)) %>%
ungroup() %>%
select(SAMP_SITE_NAME, STD_CON_LONG_NAME, SAMP_DATE, STD_VALUE_RPTD, STD_ANAL_UNITS_RPTD, FILTERED_FLAG, LAB_QUALIFIER, MED_V, MIN_V, MAX_V) %>%
rename(Well = SAMP_SITE_NAME, Constit = STD_CON_LONG_NAME, Date = SAMP_DATE, Value = STD_VALUE_RPTD, Unit = STD_ANAL_UNITS_RPTD, Filtered = FILTERED_FLAG, Flag = LAB_QUALIFIER, Median = MED_V, Min = MIN_V, Max = MAX_V) %>%
mutate(Date = mdy(Date))
dfs <- split(C_data_out, with(C_data_out, interaction(Well, Constit, Filtered)), drop = TRUE)
dfs[2]
这从原始输入中分离出如下所示的数据帧:
$`299-E13-14.Gross alpha.N`
# A tibble: 4 x 10
Well Constit Date Value Unit Filtered Flag Median Min Max
<chr> <chr> <date> <dbl> <chr> <chr> <chr> <dbl> <dbl> <dbl>
1 299-E13-14 Gross alpha 2014-04-11 3.40 pCi/L N <NA> 2.745 1.86 3.89
2 299-E13-14 Gross alpha 2015-04-08 2.09 pCi/L N <NA> 2.745 1.86 3.89
3 299-E13-14 Gross alpha 2016-04-25 3.89 pCi/L N <NA> 2.745 1.86 3.89
4 299-E13-14 Gross alpha 2017-04-06 1.86 pCi/L N <NA> 2.745 1.86 3.89
接下来我需要对每个拆分出的数据帧应用一个简单的线性回归模型。我尝试使用以下各种排列无济于事。
fit <-
dfs %>%
lm(Value ~ Date)
# Get slope by:
slope <- fit$coefficients[[2]]
slope
此输出给出:
fit <-
dfs %>%
lm(Value ~ Date, data = dfs)
Error in formula.default(object, env = baseenv()) : invalid formula
slope = fit$coefficients[[2]]
Error: object 'fit' not found
slope
(Intercept) Date
109778.966473 -5.093003
这似乎应用于整个原始数据集,而不是应用于单独的拆分数据帧。我想将单个数据帧的斜率输出到一个文件或更好的是将斜率作为矢量附加到 dfs 中的数据帧。
任何帮助将不胜感激!
这样的事情可能会奏效。不过我没有你的数据,所以无法测试。
# calculate the fit models per data frame
fits <- lapply( dfs, function(x) {
lm( formula = Value ~ Date, data = x )
} )
# extract the slope from all models
slopes <- sapply( fits, function(x) x$coefficients )
# print one of the results to see it
slopes[1]