从 R 中的大列表中提取回归系数
Extract regression coefficients out of large list in R
我有一个包含大约 100 列的大型数据框,并按年份将其拆分。我想将前一年的 x[i] 作为自变量回归到后一年的 x[i] 作为因变量:xS = a0+ a1xP + e
我的代码如下所示:
d1 <- structure(list(Date=c("2012-01-01", "2012-06-01",
"2013-01-01", "2013-06-01", "2014-01-01", "2014-06-01"),
x1=c(NA, NA, 17L, 29L, 27L, 10L),
x2=c(30L, 19L, 22L, 20L, 11L,24L),
x3=c(NA, 23L, 22L, 27L, 21L, 26L),
x4=c(30L, 28L, 23L,24L, 10L, 17L),
x5=c(NA, NA, NA, 16L, 30L, 26L)),
row.names=c(NA, 6L), class="data.frame")
rownames(d1) <- d1[, "Date"]
d1 <- d1[,-1]
df2012 <- d1[1:2,]
df2013 <- d1[3:4,]
df2014 <- d1[4:5,]
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
zzq <- sapply(lms, coef)
zzq <- do.call(rbind.data.frame, zzq)
zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
编辑 2:
lms
给出以下输出:
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[3]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27 NA NA NA
df2012[, i] NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
[[4]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.0 NA NA NA
df2012[, i] -0.5 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[5]]
NULL
[[1]]
和 [[5]]
给我 NULL
.
有没有办法修改函数 condlm,它给我一个 NA 而不是 NULL
?
最后,在使用 zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
提取截距后,我的数据框 zzq 应该如下所示:
Estimate Std. Error t value Pr(>|t|)
(Intercept) NA NaN NaN NaN
(Intercept)2 16.54545 NaN NaN NaN
(Intercept)3 27.00000 NaN NaN NaN
(Intercept)4 38.00000 NaN NaN NaN
(Intercept)5 NA NaN NaN NaN
谢谢
您可以通过以下修改获得标准误差、p 值等:
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
但是请注意,由于示例中当前数据的结构方式,您没有足够的数据来获取 std 的数值。错误等,因为您的模型拟合不足。
例如,使用您的样本数据,我们将得到以下(部分输出)
> lms
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
我有一个包含大约 100 列的大型数据框,并按年份将其拆分。我想将前一年的 x[i] 作为自变量回归到后一年的 x[i] 作为因变量:xS = a0+ a1xP + e
我的代码如下所示:
d1 <- structure(list(Date=c("2012-01-01", "2012-06-01",
"2013-01-01", "2013-06-01", "2014-01-01", "2014-06-01"),
x1=c(NA, NA, 17L, 29L, 27L, 10L),
x2=c(30L, 19L, 22L, 20L, 11L,24L),
x3=c(NA, 23L, 22L, 27L, 21L, 26L),
x4=c(30L, 28L, 23L,24L, 10L, 17L),
x5=c(NA, NA, NA, 16L, 30L, 26L)),
row.names=c(NA, 6L), class="data.frame")
rownames(d1) <- d1[, "Date"]
d1 <- d1[,-1]
df2012 <- d1[1:2,]
df2013 <- d1[3:4,]
df2014 <- d1[4:5,]
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
zzq <- sapply(lms, coef)
zzq <- do.call(rbind.data.frame, zzq)
zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
编辑 2:
lms
给出以下输出:
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[3]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 1 residuals are 0: no residual degrees of freedom!
Coefficients: (1 not defined because of singularities)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 27 NA NA NA
df2012[, i] NA NA NA NA
Residual standard error: NaN on 0 degrees of freedom
(1 observation deleted due to missingness)
[[4]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 38.0 NA NA NA
df2012[, i] -0.5 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA
[[5]]
NULL
[[1]]
和 [[5]]
给我 NULL
.
有没有办法修改函数 condlm,它给我一个 NA 而不是 NULL
?
最后,在使用 zzq <- zzq[grepl("(Intercept)", rownames(zzq)) ,]
提取截距后,我的数据框 zzq 应该如下所示:
Estimate Std. Error t value Pr(>|t|)
(Intercept) NA NaN NaN NaN
(Intercept)2 16.54545 NaN NaN NaN
(Intercept)3 27.00000 NaN NaN NaN
(Intercept)4 38.00000 NaN NaN NaN
(Intercept)5 NA NaN NaN NaN
谢谢
您可以通过以下修改获得标准误差、p 值等:
condlm <- function(i){
if(sum(is.na(df2012[,i]))==dim(df2013)[1]) # ignore the columns only containing NA's
return()
else
lm.model <- lm(df2013[,i]~df2012[,i])
summary(lm.model)
}
lms <- lapply(1:dim(df2013)[2], condlm)
lms
但是请注意,由于示例中当前数据的结构方式,您没有足够的数据来获取 std 的数值。错误等,因为您的模型拟合不足。
例如,使用您的样本数据,我们将得到以下(部分输出)
> lms
[[1]]
NULL
[[2]]
Call:
lm(formula = df2013[, i] ~ df2012[, i])
Residuals:
ALL 2 residuals are 0: no residual degrees of freedom!
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 16.5455 NA NA NA
df2012[, i] 0.1818 NA NA NA
Residual standard error: NaN on 0 degrees of freedom
Multiple R-squared: 1, Adjusted R-squared: NaN
F-statistic: NaN on 1 and 0 DF, p-value: NA