将 bs() 函数用于样条曲线时如何解释 lm() 系数估计

Question

我在 "symmetric V-shape" 中使用了从 (-5,5) 到 (0,0) 和 (5,5) 的一组点。我正在用 lm() 和 bs() 函数拟合模型以拟合 "V-shape" 样条曲线：

lm(formula = y ~ bs(x, degree = 1, knots = c(0)))

当我通过 predict() 预测结果并绘制预测线时，我得到了 "V-shape"。但是当我查看模型估计 coef() 时，我看到了我不期望的估计。

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)  
(Intercept)                       4.93821    0.16117  30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079    0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545    0.21701  -0.256    0.805

我希望第一部分的系数为 -1，第二部分的系数为 +1。我必须以不同的方式解释估算值吗？

如果我手动填充 lm() 函数中的结，我会得到这些系数：

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.18258    0.13558  -1.347    0.215    
x           -1.02416    0.04805 -21.313 2.47e-08 ***
z            2.03723    0.08575  23.759 1.05e-08 ***

更像是这样。 Z（结点）对 x 的相对变化为 ~ +1

我想了解如何解释 bs() 结果。我检查过，手册和 bs 模型预测值完全相同。

Answer 1

单节点一次样条的简单示例和解释估计系数以计算拟合线的斜率：

library(splines)
set.seed(313)
x<-seq(-5,+5,len=1000)
y<-c(seq(5,0,len=500)+rnorm(500,0,0.25),
     seq(0,10,len=500)+rnorm(500,0,0.25))
plot(x,y, xlim = c(-6,+6), ylim = c(0,+8))
fit <- lm(formula = y ~ bs(x, degree = 1, knots = c(0)))
x.predict <- seq(-2.5,+2.5,len = 100)
lines(x.predict, predict(fit, data.frame(x = x.predict)), col =2, lwd = 2)

产生情节由于我们正在用 degree=1（即直线）和 x=0 处的结拟合样条，因此我们有两条线用于 x<=0 和 x>0。

系数是

> round(summary(fit)$coefficients,3)
                                 Estimate Std. Error  t value Pr(>|t|)
(Intercept)                         5.014      0.021  241.961        0
bs(x, degree = 1, knots = c(0))1   -5.041      0.030 -166.156        0
bs(x, degree = 1, knots = c(0))2    4.964      0.027  182.915        0

可以使用节点（我们在 x=0 指定的）和边界节点（min/max的解释性数据）：

# two boundary knots and one specified
knot.boundary.left <- min(x)
knot <- 0
knot.boundary.right <- max(x)

slope.1 <- summary(fit)$coefficients[2,1] /(knot - knot.boundary.left)
slope.2 <- (summary(fit)$coefficients[3,1] - summary(fit)$coefficients[2,1]) / (knot.boundary.right - knot)
slope.1
slope.2
> slope.1
[1] -1.008238
> slope.2
[1] 2.000988

Answer 2

I would expect a -1 coefficient for the first part and a +1 coefficient for the second part.

我认为您的问题实际上是关于 什么是 B 样条函数。如果您想了解系数的含义，则需要知道样条的基函数是什么。请参阅以下内容：

library(splines)
x <- seq(-5, 5, length = 100)
b <- bs(x, degree = 1, knots = 0)  ## returns a basis matrix
str(b)  ## check structure
b1 <- b[, 1]  ## basis 1
b2 <- b[, 2]  ## basis 2
par(mfrow = c(1, 2))
plot(x, b1, type = "l", main = "basis 1: b1")
plot(x, b2, type = "l", main = "basis 2: b2")

注：

1 次 B 样条是 帐篷函数，正如您从 b1;
1 次 B 样条缩放，因此它们的函数值介于 (0, 1);
1 次 B 样条的节是 它弯曲的地方 ;
1 次 B 样条曲线紧凑，并且仅在（不超过）三个相邻节点上非零。

您可以从 Definition of B-spline 得到 B 样条的（递归）表达式。 0次B样条是最基础class，而

1次B样条是0次B样条的线性组合
2次B样条是1次B样条的线性组合
3次B样条是2次B样条的线性组合

（抱歉，我跑题了...）

您使用 B 样条的线性回归：

y ~ bs(x, degree = 1, knots = 0)

正在做：

y ~ b1 + b2

现在，你应该能够理解你得到的系数是什么意思，这意味着样条函数是：

-5.12079 * b1 - 0.05545 * b2

总结table:

Coefficients:
                                 Estimate Std. Error t value Pr(>|t|)  
(Intercept)                       4.93821    0.16117  30.639 1.40e-09 ***
bs(x, degree = 1, knots = c(0))1 -5.12079    0.24026 -21.313 2.47e-08 ***
bs(x, degree = 1, knots = c(0))2 -0.05545    0.21701  -0.256    0.805

您可能想知道为什么 b2 的系数不显着。好吧，比较一下你的 y 和 b1：你的 y 是 对称 V 形 ，而 b1 是 反对称 V 形。如果您先将 -1 乘以 b1，然后乘以 5 重新缩放它（这解释了 b1 的系数 -5），您会得到什么？好匹配，对吧？所以不需要b2.

但是，如果您的 y 是不对称的，运行从 (-5,5) 到 (0,0)，然后到 (5,10)，那么您会注意到系数b1 和 b2 都很重要。我想其他答案已经给了你这样的例子。

此处演示了将拟合 B 样条重新参数化为分段多项式：。

将 bs() 函数用于样条曲线时如何解释 lm() 系数估计

How to interpret lm() coefficient estimates when using bs() function for splines

regression

r

spline

lm

bspline