如何从拟合 R 中的线性 b 样条回归中提取基础系数?
How to extract the underlying coefficients from fitting a linear b spline regression in R?
以下面的一结、一阶、样条为例:
library(splines)
library(ISLR)
age.grid = seq(range(Wage$age)[1], range(Wage$age)[2])
fit.spline = lm(wage~bs(age, knots=c(30), degree=1), data=Wage)
pred.spline = predict(fit.spline, newdata=list(age=age.grid), se=T)
plot(Wage$age, Wage$wage, col="gray")
lines(age.grid, pred.spline$fit, col="red")
# NOTE: This is **NOT** the same as fitting two piece-wise linear models becase
# the spline will add the contraint that the function is continuous at age=30
# fit.1 = lm(wage~age, data=subset(Wage,age<30))
# fit.2 = lm(wage~age, data=subset(Wage,age>=30))
有没有办法提取结前后的线性模型(及其系数)?即如何提取age=30
切点前后的两个线性模型?
使用 summary(fit.spline)
产生系数,但(据我所知)它们对解释没有意义。
您可以像这样从 fit.spline
中手动提取系数
summary(fit.spline)
Call:
lm(formula = wage ~ bs(age, knots = 30, degree = 1), data = Wage)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.19 4.05 13.4 <2e-16 ***
bs(age, knots = 30, degree = 1)1 58.43 4.61 12.7 <2e-16 ***
bs(age, knots = 30, degree = 1)2 68.73 4.54 15.1 <2e-16 ***
---
range(Wage$age)
## [1] 18 80
## coefficients of the first model
a1 <- seq(18, 30, length.out = 10)
b1 <- seq(54.19, 58.43+54.19, length.out = 10)
## coefficients of the second model
a2 <- seq(30, 80, length.out = 10)
b2 <- seq(54.19 + 58.43, 54.19 + 68.73, length.out = 10)
plot(Wage$age, Wage$wage, col="gray", xlim = c(0, 90))
lines(x = a1, y = b1, col = "blue" )
lines(x = a2, y = b2, col = "red")
如果你想要线性模型中的斜率系数,那么你可以简单地使用
b1 <- (58.43)/(30 - 18)
b2 <- (68.73 - 58.43)/(80 - 30)
请注意,在 fit.spline
中,截距表示 age = 18
时 wage
的值,而在线性模型中,截距表示 age = 0
时的值 wage
].
提取节点主要是在您在 bspline 回归中预先指定自由度时完成的。示例:
fit.spline = lm(wage~bs(age, df=5), data=Wage)
attr(bs(age,df=5),"knots")
33.33333% 66.66667%
37 48
可以在 ISLR 书籍(您似乎正在使用)第 293 页中找到示例。
以下面的一结、一阶、样条为例:
library(splines)
library(ISLR)
age.grid = seq(range(Wage$age)[1], range(Wage$age)[2])
fit.spline = lm(wage~bs(age, knots=c(30), degree=1), data=Wage)
pred.spline = predict(fit.spline, newdata=list(age=age.grid), se=T)
plot(Wage$age, Wage$wage, col="gray")
lines(age.grid, pred.spline$fit, col="red")
# NOTE: This is **NOT** the same as fitting two piece-wise linear models becase
# the spline will add the contraint that the function is continuous at age=30
# fit.1 = lm(wage~age, data=subset(Wage,age<30))
# fit.2 = lm(wage~age, data=subset(Wage,age>=30))
有没有办法提取结前后的线性模型(及其系数)?即如何提取age=30
切点前后的两个线性模型?
使用 summary(fit.spline)
产生系数,但(据我所知)它们对解释没有意义。
您可以像这样从 fit.spline
中手动提取系数
summary(fit.spline)
Call:
lm(formula = wage ~ bs(age, knots = 30, degree = 1), data = Wage)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 54.19 4.05 13.4 <2e-16 ***
bs(age, knots = 30, degree = 1)1 58.43 4.61 12.7 <2e-16 ***
bs(age, knots = 30, degree = 1)2 68.73 4.54 15.1 <2e-16 ***
---
range(Wage$age)
## [1] 18 80
## coefficients of the first model
a1 <- seq(18, 30, length.out = 10)
b1 <- seq(54.19, 58.43+54.19, length.out = 10)
## coefficients of the second model
a2 <- seq(30, 80, length.out = 10)
b2 <- seq(54.19 + 58.43, 54.19 + 68.73, length.out = 10)
plot(Wage$age, Wage$wage, col="gray", xlim = c(0, 90))
lines(x = a1, y = b1, col = "blue" )
lines(x = a2, y = b2, col = "red")
如果你想要线性模型中的斜率系数,那么你可以简单地使用
b1 <- (58.43)/(30 - 18)
b2 <- (68.73 - 58.43)/(80 - 30)
请注意,在 fit.spline
中,截距表示 age = 18
时 wage
的值,而在线性模型中,截距表示 age = 0
时的值 wage
].
提取节点主要是在您在 bspline 回归中预先指定自由度时完成的。示例:
fit.spline = lm(wage~bs(age, df=5), data=Wage)
attr(bs(age,df=5),"knots")
33.33333% 66.66667%
37 48
可以在 ISLR 书籍(您似乎正在使用)第 293 页中找到示例。