使用sapply输出R中回归循环的所有回归系数?
Outputting all regression coefficients for a regression loop in R using sapply?
我是 R、编码和堆栈溢出的新手。我正在尝试 运行 对序数变量 'Age' 的每个水平进行多元线性回归。年龄有 10 个可能的整数值。所有其他变量都是连续的。
我部分设法获得了 'Age' 每个级别的回归输出,但我不知道如何显示循环中每个级别的系数的完整摘要 table。
这就是我的意思:当我 运行 Age==1 的回归子集时,我得到以下摘要输出:
##Regression for Age==1
Final_Frame.df <- read_csv("mydata.csv")
dim(Final_Frame.df)
Age_1=Final_Frame.df[Final_Frame.df$Age==1,]
dim(Age_1)
Effects_lm=lm(Product_Sum~Mean_social_combined +
Mean_traditional_time+
Mean_Passive_Use_Updated+
Mean_Active_Use_Updated, data=Age_1)
summary(Effects_lm)
这是输出
Call:
lm(formula = Product_Sum ~ Mean_social_combined + Mean_traditional_time +
Mean_Passive_Use_Updated + Mean_Active_Use_Updated, data = Age_1)
Residuals:
Min 1Q Median 3Q Max
-23.367 -11.079 -2.066 9.540 48.903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.485 7.491 2.067 0.0444 *
Mean_social_combined -1.086 5.625 -0.193 0.8477
Mean_traditional_time 1.310 3.311 0.396 0.6942
Mean_Passive_Use_Updated -3.004 3.377 -0.889 0.3784
Mean_Active_Use_Updated 9.130 5.914 1.544 0.1295
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.92 on 46 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.05779, Adjusted R-squared: -0.02414
F-statistic: 0.7053 on 4 and 46 DF, p-value: 0.5924
(Intercept) Mean_social_combined Mean_traditional_time Mean_Passive_Use_Updated
10.1801725 0.5982227 0.2666642 -1.7716028
Mean_Active_Use_Updated
11.6577843
但是当我 运行 它的所有级别时,我没有得到相同数量的信息。我得到的最接近的是使用 .coefs,它只产生回归系数。
##Functionalisation
my.JORT.lm=function(Age.df) {coef(lm(Product_Sum~Mean_social_combined +
Mean_traditional_time+
Mean_Passive_Use_Updated+
Mean_Active_Use_Updated, data=Age.df))}
##Split age level
Age.by.level=split(Final_Frame.df, f=Final_Frame.df$Age)
class(Age.by.level)
names(Age.by.level)
#Regression output for all 10 levels of age
Final_Frame2.df=sapply(Age.by.level, FUN=my.JORT.lm)
Final_Frame2.df.coefs
输出
1 2 3 4 5 6 7 8 9
(Intercept) 15.485342 19.671566 -2.6799874 6.707780 -2.9383992 6.074756 6.535079 -2.833462 4.070595
Mean_social_combined -1.086346 6.727591 6.3753196 2.006972 2.2910173 -3.647688 -7.492282 -3.232723 -1.590179
Mean_traditional_time 1.309759 -2.017883 0.6843741 4.795550 1.4745771 2.983761 4.227461 5.406311 1.985889
Mean_Passive_Use_Updated -3.003786 -8.415782 -2.8591079 4.999754 -0.6350261 5.354196 5.413747 3.588647 5.573119
Mean_Active_Use_Updated 9.129950 15.154421 10.5226187 -11.222790 9.7848515 -3.432742 -2.406095 2.851160 -4.111706
10
(Intercept) -18.799694
Mean_social_combined 24.837171
Mean_traditional_time 1.043116
Mean_Passive_Use_Updated 3.725663
Mean_Active_Use_Updated -6.127393
当我尝试使用 $rsq 检索 r 平方和调整后的 r 平方时,我得到“Final_Frame2.df$rsq 中的错误:$ 运算符对原子向量无效”。有人可以让我知道如何为更复杂的回归复制 Age==1 的输出吗?我特别需要 p 值、r2 和调整后的 r2s。我希望这个问题足够清楚。谢谢!
如果您只需要摘要,您可以使用 by
完成所有操作。这是 mtcars
.
的示例
my.fun <- function(df){summary(lm(mpg ~ hp + drat, data=df))}
by(mtcars, list(mtcars$cyl), my.fun)
# : 4
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.038 -2.580 -1.433 1.670 7.306
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 22.58423 20.15984 1.120 0.295
# hp -0.08962 0.07144 -1.254 0.245
# drat 2.82123 4.09205 0.689 0.510
#
# Residual standard error: 4.174 on 8 degrees of freedom
# Multiple R-squared: 0.3148, Adjusted R-squared: 0.1435
# F-statistic: 1.837 on 2 and 8 DF, p-value: 0.2205
#
# ------------------------------------------------------------------------
# : 6
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280 Merc 280C
# 0.9964 0.9964 1.7704 -1.4315 -0.6885 -2.0885
# Ferrari Dino
# 0.4453
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 19.276291 5.955542 3.237 0.0318 *
# hp -0.009557 0.030111 -0.317 0.7668
# drat 0.456031 1.534477 0.297 0.7811
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.747 on 4 degrees of freedom
# Multiple R-squared: 0.0374, Adjusted R-squared: -0.4439
# F-statistic: 0.07771 on 2 and 4 DF, p-value: 0.9266
#
# ------------------------------------------------------------------------
# : 8
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.9390 -1.1833 0.1403 1.6083 3.5607
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 11.57381 6.29341 1.839 0.093 .
# hp -0.02861 0.01840 -1.555 0.148
# drat 2.94579 2.51872 1.170 0.267
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.517 on 11 degrees of freedom
# Multiple R-squared: 0.1822, Adjusted R-squared: 0.03345
# F-statistic: 1.225 on 2 and 11 DF, p-value: 0.3309
编辑:Return 系数、p-val 和 R2
my.fun <- function(df){
s <- summary(m <- lm(mpg ~ hp + drat, data=df))
b <- coef(m)
p <- s$coef[,4]
bp <- c(rbind(b, p))
names(bp) <- c(rbind(names(b), paste(names(b), "p", sep="_")))
out <- c(bp, r2 = s$r.squared, r2.adj = s$adj.r.squared)
out
}
sapply(split(mtcars, mtcars$cyl), function(x)my.fun(x))
# 4 6 8
# (Intercept) 22.58423314 19.276290671 11.57380765
# (Intercept)_p 0.29510529 0.031773165 0.09303542
# hp -0.08961637 -0.009556567 -0.02861472
# hp_p 0.24508497 0.766829526 0.14815496
# drat 2.82123087 0.456031349 2.94579083
# drat_p 0.51004482 0.781115913 0.26689654
# r2 0.31476959 0.037400884 0.18215026
# r2.adj 0.14346199 -0.443898674 0.03345031
我是 R、编码和堆栈溢出的新手。我正在尝试 运行 对序数变量 'Age' 的每个水平进行多元线性回归。年龄有 10 个可能的整数值。所有其他变量都是连续的。
我部分设法获得了 'Age' 每个级别的回归输出,但我不知道如何显示循环中每个级别的系数的完整摘要 table。
这就是我的意思:当我 运行 Age==1 的回归子集时,我得到以下摘要输出:
##Regression for Age==1
Final_Frame.df <- read_csv("mydata.csv")
dim(Final_Frame.df)
Age_1=Final_Frame.df[Final_Frame.df$Age==1,]
dim(Age_1)
Effects_lm=lm(Product_Sum~Mean_social_combined +
Mean_traditional_time+
Mean_Passive_Use_Updated+
Mean_Active_Use_Updated, data=Age_1)
summary(Effects_lm)
这是输出
Call:
lm(formula = Product_Sum ~ Mean_social_combined + Mean_traditional_time +
Mean_Passive_Use_Updated + Mean_Active_Use_Updated, data = Age_1)
Residuals:
Min 1Q Median 3Q Max
-23.367 -11.079 -2.066 9.540 48.903
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.485 7.491 2.067 0.0444 *
Mean_social_combined -1.086 5.625 -0.193 0.8477
Mean_traditional_time 1.310 3.311 0.396 0.6942
Mean_Passive_Use_Updated -3.004 3.377 -0.889 0.3784
Mean_Active_Use_Updated 9.130 5.914 1.544 0.1295
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 15.92 on 46 degrees of freedom
(5 observations deleted due to missingness)
Multiple R-squared: 0.05779, Adjusted R-squared: -0.02414
F-statistic: 0.7053 on 4 and 46 DF, p-value: 0.5924
(Intercept) Mean_social_combined Mean_traditional_time Mean_Passive_Use_Updated
10.1801725 0.5982227 0.2666642 -1.7716028
Mean_Active_Use_Updated
11.6577843
但是当我 运行 它的所有级别时,我没有得到相同数量的信息。我得到的最接近的是使用 .coefs,它只产生回归系数。
##Functionalisation
my.JORT.lm=function(Age.df) {coef(lm(Product_Sum~Mean_social_combined +
Mean_traditional_time+
Mean_Passive_Use_Updated+
Mean_Active_Use_Updated, data=Age.df))}
##Split age level
Age.by.level=split(Final_Frame.df, f=Final_Frame.df$Age)
class(Age.by.level)
names(Age.by.level)
#Regression output for all 10 levels of age
Final_Frame2.df=sapply(Age.by.level, FUN=my.JORT.lm)
Final_Frame2.df.coefs
输出
1 2 3 4 5 6 7 8 9
(Intercept) 15.485342 19.671566 -2.6799874 6.707780 -2.9383992 6.074756 6.535079 -2.833462 4.070595
Mean_social_combined -1.086346 6.727591 6.3753196 2.006972 2.2910173 -3.647688 -7.492282 -3.232723 -1.590179
Mean_traditional_time 1.309759 -2.017883 0.6843741 4.795550 1.4745771 2.983761 4.227461 5.406311 1.985889
Mean_Passive_Use_Updated -3.003786 -8.415782 -2.8591079 4.999754 -0.6350261 5.354196 5.413747 3.588647 5.573119
Mean_Active_Use_Updated 9.129950 15.154421 10.5226187 -11.222790 9.7848515 -3.432742 -2.406095 2.851160 -4.111706
10
(Intercept) -18.799694
Mean_social_combined 24.837171
Mean_traditional_time 1.043116
Mean_Passive_Use_Updated 3.725663
Mean_Active_Use_Updated -6.127393
当我尝试使用 $rsq 检索 r 平方和调整后的 r 平方时,我得到“Final_Frame2.df$rsq 中的错误:$ 运算符对原子向量无效”。有人可以让我知道如何为更复杂的回归复制 Age==1 的输出吗?我特别需要 p 值、r2 和调整后的 r2s。我希望这个问题足够清楚。谢谢!
如果您只需要摘要,您可以使用 by
完成所有操作。这是 mtcars
.
my.fun <- function(df){summary(lm(mpg ~ hp + drat, data=df))}
by(mtcars, list(mtcars$cyl), my.fun)
# : 4
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.038 -2.580 -1.433 1.670 7.306
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 22.58423 20.15984 1.120 0.295
# hp -0.08962 0.07144 -1.254 0.245
# drat 2.82123 4.09205 0.689 0.510
#
# Residual standard error: 4.174 on 8 degrees of freedom
# Multiple R-squared: 0.3148, Adjusted R-squared: 0.1435
# F-statistic: 1.837 on 2 and 8 DF, p-value: 0.2205
#
# ------------------------------------------------------------------------
# : 6
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Mazda RX4 Mazda RX4 Wag Hornet 4 Drive Valiant Merc 280 Merc 280C
# 0.9964 0.9964 1.7704 -1.4315 -0.6885 -2.0885
# Ferrari Dino
# 0.4453
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 19.276291 5.955542 3.237 0.0318 *
# hp -0.009557 0.030111 -0.317 0.7668
# drat 0.456031 1.534477 0.297 0.7811
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 1.747 on 4 degrees of freedom
# Multiple R-squared: 0.0374, Adjusted R-squared: -0.4439
# F-statistic: 0.07771 on 2 and 4 DF, p-value: 0.9266
#
# ------------------------------------------------------------------------
# : 8
#
# Call:
# lm(formula = mpg ~ hp + drat, data = df)
#
# Residuals:
# Min 1Q Median 3Q Max
# -3.9390 -1.1833 0.1403 1.6083 3.5607
#
# Coefficients:
# Estimate Std. Error t value Pr(>|t|)
# (Intercept) 11.57381 6.29341 1.839 0.093 .
# hp -0.02861 0.01840 -1.555 0.148
# drat 2.94579 2.51872 1.170 0.267
# ---
# Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#
# Residual standard error: 2.517 on 11 degrees of freedom
# Multiple R-squared: 0.1822, Adjusted R-squared: 0.03345
# F-statistic: 1.225 on 2 and 11 DF, p-value: 0.3309
编辑:Return 系数、p-val 和 R2
my.fun <- function(df){
s <- summary(m <- lm(mpg ~ hp + drat, data=df))
b <- coef(m)
p <- s$coef[,4]
bp <- c(rbind(b, p))
names(bp) <- c(rbind(names(b), paste(names(b), "p", sep="_")))
out <- c(bp, r2 = s$r.squared, r2.adj = s$adj.r.squared)
out
}
sapply(split(mtcars, mtcars$cyl), function(x)my.fun(x))
# 4 6 8
# (Intercept) 22.58423314 19.276290671 11.57380765
# (Intercept)_p 0.29510529 0.031773165 0.09303542
# hp -0.08961637 -0.009556567 -0.02861472
# hp_p 0.24508497 0.766829526 0.14815496
# drat 2.82123087 0.456031349 2.94579083
# drat_p 0.51004482 0.781115913 0.26689654
# r2 0.31476959 0.037400884 0.18215026
# r2.adj 0.14346199 -0.443898674 0.03345031