`poly()` 如何生成正交多项式?如何理解返回的"coefs"?
How `poly()` generates orthogonal polynomials? How to understand the "coefs" returned?
我对正交多项式的理解是它们采用以下形式
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... 达到所需的术语数
其中 a1、a2 etc 是每个正交项的系数(因拟合而异), c1, c2 etc 是正交项内的系数, 确定这些项保持正交性(使用相同 x 值的拟合之间一致)
我理解 poly()
用于拟合正交多项式。一个例子
x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) # abscissae not equally spaced
y = c(1.217395, 1.604360, 2.834947, 4.585687, 8.770932, 9.996260, 9.264800, 9.155079, 7.949278, 7.317690, 6.377519, 6.409620, 6.643426)
# construct the orthogonal polynomial
orth_poly <- poly(x, degree = 5)
# fit y to orthogonal polynomial
model <- lm(y ~ orth_poly)
我想同时提取系数 a1, a2 etc,作为正交系数 c1、c2 etc。我不知道该怎么做。我的猜测是
model$coefficients
returns 第一组系数,但我正在为如何提取其他系数而苦苦挣扎。也许在
之内
attributes(orth_poly)$coefs
?
非常感谢。
我刚刚意识到2年前有一个密切相关的问题Extracting orthogonal polynomial coefficients from R's poly() function?。那里的答案只是解释了 predict.poly
的作用,但我的答案给出了完整的画面。
第 1 节:poly
如何表示正交多项式
My understanding of orthogonal polynomials is that they take the form
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired
不不,没有这么干净的表格。 poly()
生成可由以下递归算法表示的一元正交多项式。这就是 predict.poly
生成线性预测矩阵的方式。令人意外的是,poly
本身并没有使用这样的递归,而是使用了一个蛮力:正交跨度的普通多项式模型矩阵的QR分解。但是,这相当于递归。
第 2 部分:poly()
的输出说明
让我们考虑一个例子。在你的 post,
中取 x
X <- poly(x, degree = 5)
# 1 2 3 4 5
# [1,] 0.484259711 0.48436462 0.48074040 0.351250507 0.25411350
# [2,] 0.406027697 0.20038942 -0.06236564 -0.303377083 -0.46801416
# [3,] 0.327795682 -0.02660187 -0.34049024 -0.338222850 -0.11788140
# ... ... ... ... ... ...
#[12,] -0.321069852 0.28705108 -0.15397819 -0.006975615 0.16978124
#[13,] -0.357884918 0.42236400 -0.40180712 0.398738364 -0.34115435
#attr(,"coefs")
#attr(,"coefs")$alpha
#[1] 1.054769 1.078794 1.063917 1.075700 1.063079
#
#attr(,"coefs")$norm2
#[1] 1.000000e+00 1.300000e+01 4.722031e-02 1.028848e-04 2.550358e-07
#[6] 5.567156e-10 1.156628e-12
以下是这些属性:
alpha[1]
给出x_bar = mean(x)
,即中心;
alpha - alpha[1]
给出 alpha0
, alpha1
, ..., alpha4
(alpha5
被计算但在 poly
returns X
,因为它不会在 predict.poly
); 中使用
norm2
的第一个值始终为 1。倒数第二个是 l0
、l1
、...、l5
,给出平方列X
的范数; l0
是删除的 P0(x - x_bar)
的列平方范数,它总是 n
(即 length(x)
);而第一个 1
只是被填充以便递归在 predict.poly
. 内进行
beta0
, beta1
, beta2
, ..., beta_5
不返回,但可以通过 norm2[-1] / norm2[-length(norm2)]
.[=156 计算=]
第 3 部分:使用 QR 分解和递归算法实现 poly
如前所述,poly
不使用递归,而predict.poly
使用。我个人不明白这种不一致设计背后的逻辑/原因。在这里我会提供一个函数 my_poly
自己写的,它使用递归来生成矩阵,如果 QR = FALSE
。当QR = TRUE
时,它是一个相似但不完全相同的实现poly
。代码注释非常好,有助于您理解这两种方法。
## return a model matrix for data `x`
my_poly <- function (x, degree = 1, QR = TRUE) {
## check feasibility
if (length(unique(x)) < degree)
stop("insufficient unique data points for specified degree!")
## centring covariates (so that `x` is orthogonal to intercept)
centre <- mean(x)
x <- x - centre
if (QR) {
## QR factorization of design matrix of ordinary polynomial
QR <- qr(outer(x, 0:degree, "^"))
## X <- qr.Q(QR) * rep(diag(QR$qr), each = length(x))
## i.e., column rescaling of Q factor by `diag(R)`
## also drop the intercept
X <- qr.qy(QR, diag(diag(QR$qr), length(x), degree + 1))[, -1, drop = FALSE]
## now columns of `X` are orthorgonal to each other
## i.e., `crossprod(X)` is diagonal
X2 <- X * X
norm2 <- colSums(X * X) ## squared L2 norm
alpha <- drop(crossprod(X2, x)) / norm2
beta <- norm2 / (c(length(x), norm2[-degree]))
colnames(X) <- 1:degree
}
else {
beta <- alpha <- norm2 <- numeric(degree)
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, nrow = length(x), ncol = degree, dimnames = list(NULL, 1:degree))
## compute alpha[1] and beta[1]
norm2[1] <- new_norm <- drop(crossprod(x))
alpha[1] <- sum(x ^ 3) / new_norm
beta[1] <- new_norm / length(x)
if (degree > 1L) {
old_norm <- new_norm
## second polynomial
X[, 2] <- Xi <- (x - alpha[1]) * X[, 1] - beta[1]
norm2[2] <- new_norm <- drop(crossprod(Xi))
alpha[2] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[2] <- new_norm / old_norm
old_norm <- new_norm
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- Xi <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
norm2[i] <- new_norm <- drop(crossprod(Xi))
alpha[i] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[i] <- new_norm / old_norm
old_norm <- new_norm
i <- i + 1
}
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
scale <- sqrt(norm2)
X <- X * rep(1 / scale, each = length(x))
## add attributes and return
attr(X, "coefs") <- list(centre = centre, scale = scale, alpha = alpha[-degree], beta = beta[-degree])
X
}
第 4 部分:my_poly
的输出说明
X <- my_poly(x, 5, FALSE)
生成的矩阵与 poly
生成的矩阵相同,因此被忽略。属性不一样
#attr(,"coefs")
#attr(,"coefs")$centre
#[1] 1.054769
#attr(,"coefs")$scale
#[1] 2.173023e-01 1.014321e-02 5.050106e-04 2.359482e-05 1.075466e-06
#attr(,"coefs")$alpha
#[1] 0.024025005 0.009147498 0.020930616 0.008309835
#attr(,"coefs")$beta
#[1] 0.003632331 0.002178825 0.002478848 0.002182892
my_poly
returns施工信息更明显:
centre
给出 x_bar = mean(x)
;
scale
给出列范数(poly
返回的 norm2
的平方根);
alpha
给出 alpha1
、alpha2
、alpha3
、alpha4
;
beta
给出 beta1
, beta2
, beta3
, beta4
.
第 5 部分:my_poly
的预测例程
由于my_poly
returns属性不同,stats:::predict.poly
不兼容my_poly
。这是适当的例程 my_predict_poly
:
## return a linear predictor matrix, given a model matrix `X` and new data `x`
my_predict_poly <- function (X, x) {
## extract construction info
coefs <- attr(X, "coefs")
centre <- coefs$centre
alpha <- coefs$alpha
beta <- coefs$beta
degree <- ncol(X)
## centring `x`
x <- x - coefs$centre
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, length(x), degree, dimnames = list(NULL, 1:degree))
if (degree > 1L) {
## second polynomial
X[, 2] <- (x - alpha[1]) * X[, 1] - beta[1]
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
i <- i + 1
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
X * rep(1 / coefs$scale, each = length(x))
}
考虑一个例子:
set.seed(0); x1 <- runif(5, min(x), max(x))
和
stats:::predict.poly(poly(x, 5), x1)
my_predict_poly(my_poly(x, 5, FALSE), x1)
给出完全相同的结果预测矩阵:
# 1 2 3 4 5
#[1,] 0.39726381 0.1721267 -0.10562568 -0.3312680 -0.4587345
#[2,] -0.13428822 -0.2050351 0.28374304 -0.0858400 -0.2202396
#[3,] -0.04450277 -0.3259792 0.16493099 0.2393501 -0.2634766
#[4,] 0.12454047 -0.3499992 -0.24270235 0.3411163 0.3891214
#[5,] 0.40695739 0.2034296 -0.05758283 -0.2999763 -0.4682834
请注意,预测例程只是采用现有构造信息而不是重建多项式。
第 6 节:只将 poly
和 predict.poly
视为黑框
很少需要了解里面的一切。对于统计建模,知道 poly
构造模型拟合的多项式基础就足够了,其系数可以在 lmObject$coefficients
中找到。进行预测时,predict.poly
永远不需要用户调用,因为 predict.lm
会为您完成。这样,直接把poly
和predict.poly
当成黑框就完全OK了。
我对正交多项式的理解是它们采用以下形式
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... 达到所需的术语数
其中 a1、a2 etc 是每个正交项的系数(因拟合而异), c1, c2 etc 是正交项内的系数, 确定这些项保持正交性(使用相同 x 值的拟合之间一致)
我理解 poly()
用于拟合正交多项式。一个例子
x = c(1.160, 1.143, 1.126, 1.109, 1.079, 1.053, 1.040, 1.027, 1.015, 1.004, 0.994, 0.985, 0.977) # abscissae not equally spaced
y = c(1.217395, 1.604360, 2.834947, 4.585687, 8.770932, 9.996260, 9.264800, 9.155079, 7.949278, 7.317690, 6.377519, 6.409620, 6.643426)
# construct the orthogonal polynomial
orth_poly <- poly(x, degree = 5)
# fit y to orthogonal polynomial
model <- lm(y ~ orth_poly)
我想同时提取系数 a1, a2 etc,作为正交系数 c1、c2 etc。我不知道该怎么做。我的猜测是
model$coefficients
returns 第一组系数,但我正在为如何提取其他系数而苦苦挣扎。也许在
之内attributes(orth_poly)$coefs
?
非常感谢。
我刚刚意识到2年前有一个密切相关的问题Extracting orthogonal polynomial coefficients from R's poly() function?。那里的答案只是解释了 predict.poly
的作用,但我的答案给出了完整的画面。
第 1 节:poly
如何表示正交多项式
My understanding of orthogonal polynomials is that they take the form
y(x) = a1 + a2(x - c1) + a3(x - c2)(x - c3) + a4(x - c4)(x - c5)(x - c6)... up to the number of terms desired
不不,没有这么干净的表格。 poly()
生成可由以下递归算法表示的一元正交多项式。这就是 predict.poly
生成线性预测矩阵的方式。令人意外的是,poly
本身并没有使用这样的递归,而是使用了一个蛮力:正交跨度的普通多项式模型矩阵的QR分解。但是,这相当于递归。
第 2 部分:poly()
让我们考虑一个例子。在你的 post,
中取x
X <- poly(x, degree = 5)
# 1 2 3 4 5
# [1,] 0.484259711 0.48436462 0.48074040 0.351250507 0.25411350
# [2,] 0.406027697 0.20038942 -0.06236564 -0.303377083 -0.46801416
# [3,] 0.327795682 -0.02660187 -0.34049024 -0.338222850 -0.11788140
# ... ... ... ... ... ...
#[12,] -0.321069852 0.28705108 -0.15397819 -0.006975615 0.16978124
#[13,] -0.357884918 0.42236400 -0.40180712 0.398738364 -0.34115435
#attr(,"coefs")
#attr(,"coefs")$alpha
#[1] 1.054769 1.078794 1.063917 1.075700 1.063079
#
#attr(,"coefs")$norm2
#[1] 1.000000e+00 1.300000e+01 4.722031e-02 1.028848e-04 2.550358e-07
#[6] 5.567156e-10 1.156628e-12
以下是这些属性:
alpha[1]
给出x_bar = mean(x)
,即中心;alpha - alpha[1]
给出alpha0
,alpha1
, ...,alpha4
(alpha5
被计算但在poly
returnsX
,因为它不会在predict.poly
); 中使用
norm2
的第一个值始终为 1。倒数第二个是l0
、l1
、...、l5
,给出平方列X
的范数;l0
是删除的P0(x - x_bar)
的列平方范数,它总是n
(即length(x)
);而第一个1
只是被填充以便递归在predict.poly
. 内进行
beta0
,beta1
,beta2
, ...,beta_5
不返回,但可以通过norm2[-1] / norm2[-length(norm2)]
.[=156 计算=]
第 3 部分:使用 QR 分解和递归算法实现 poly
如前所述,poly
不使用递归,而predict.poly
使用。我个人不明白这种不一致设计背后的逻辑/原因。在这里我会提供一个函数 my_poly
自己写的,它使用递归来生成矩阵,如果 QR = FALSE
。当QR = TRUE
时,它是一个相似但不完全相同的实现poly
。代码注释非常好,有助于您理解这两种方法。
## return a model matrix for data `x`
my_poly <- function (x, degree = 1, QR = TRUE) {
## check feasibility
if (length(unique(x)) < degree)
stop("insufficient unique data points for specified degree!")
## centring covariates (so that `x` is orthogonal to intercept)
centre <- mean(x)
x <- x - centre
if (QR) {
## QR factorization of design matrix of ordinary polynomial
QR <- qr(outer(x, 0:degree, "^"))
## X <- qr.Q(QR) * rep(diag(QR$qr), each = length(x))
## i.e., column rescaling of Q factor by `diag(R)`
## also drop the intercept
X <- qr.qy(QR, diag(diag(QR$qr), length(x), degree + 1))[, -1, drop = FALSE]
## now columns of `X` are orthorgonal to each other
## i.e., `crossprod(X)` is diagonal
X2 <- X * X
norm2 <- colSums(X * X) ## squared L2 norm
alpha <- drop(crossprod(X2, x)) / norm2
beta <- norm2 / (c(length(x), norm2[-degree]))
colnames(X) <- 1:degree
}
else {
beta <- alpha <- norm2 <- numeric(degree)
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, nrow = length(x), ncol = degree, dimnames = list(NULL, 1:degree))
## compute alpha[1] and beta[1]
norm2[1] <- new_norm <- drop(crossprod(x))
alpha[1] <- sum(x ^ 3) / new_norm
beta[1] <- new_norm / length(x)
if (degree > 1L) {
old_norm <- new_norm
## second polynomial
X[, 2] <- Xi <- (x - alpha[1]) * X[, 1] - beta[1]
norm2[2] <- new_norm <- drop(crossprod(Xi))
alpha[2] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[2] <- new_norm / old_norm
old_norm <- new_norm
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- Xi <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
norm2[i] <- new_norm <- drop(crossprod(Xi))
alpha[i] <- drop(crossprod(Xi * Xi, x)) / new_norm
beta[i] <- new_norm / old_norm
old_norm <- new_norm
i <- i + 1
}
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
scale <- sqrt(norm2)
X <- X * rep(1 / scale, each = length(x))
## add attributes and return
attr(X, "coefs") <- list(centre = centre, scale = scale, alpha = alpha[-degree], beta = beta[-degree])
X
}
第 4 部分:my_poly
X <- my_poly(x, 5, FALSE)
生成的矩阵与 poly
生成的矩阵相同,因此被忽略。属性不一样
#attr(,"coefs")
#attr(,"coefs")$centre
#[1] 1.054769
#attr(,"coefs")$scale
#[1] 2.173023e-01 1.014321e-02 5.050106e-04 2.359482e-05 1.075466e-06
#attr(,"coefs")$alpha
#[1] 0.024025005 0.009147498 0.020930616 0.008309835
#attr(,"coefs")$beta
#[1] 0.003632331 0.002178825 0.002478848 0.002182892
my_poly
returns施工信息更明显:
centre
给出x_bar = mean(x)
;scale
给出列范数(poly
返回的norm2
的平方根);alpha
给出alpha1
、alpha2
、alpha3
、alpha4
;beta
给出beta1
,beta2
,beta3
,beta4
.
第 5 部分:my_poly
由于my_poly
returns属性不同,stats:::predict.poly
不兼容my_poly
。这是适当的例程 my_predict_poly
:
## return a linear predictor matrix, given a model matrix `X` and new data `x`
my_predict_poly <- function (X, x) {
## extract construction info
coefs <- attr(X, "coefs")
centre <- coefs$centre
alpha <- coefs$alpha
beta <- coefs$beta
degree <- ncol(X)
## centring `x`
x <- x - coefs$centre
## repeat first polynomial `x` on all columns to initialize design matrix X
X <- matrix(x, length(x), degree, dimnames = list(NULL, 1:degree))
if (degree > 1L) {
## second polynomial
X[, 2] <- (x - alpha[1]) * X[, 1] - beta[1]
## further polynomials obtained from recursion
i <- 3
while (i <= degree) {
X[, i] <- (x - alpha[i - 1]) * X[, i - 1] - beta[i - 1] * X[, i - 2]
i <- i + 1
}
}
## column rescaling so that `crossprod(X)` is an identity matrix
X * rep(1 / coefs$scale, each = length(x))
}
考虑一个例子:
set.seed(0); x1 <- runif(5, min(x), max(x))
和
stats:::predict.poly(poly(x, 5), x1)
my_predict_poly(my_poly(x, 5, FALSE), x1)
给出完全相同的结果预测矩阵:
# 1 2 3 4 5
#[1,] 0.39726381 0.1721267 -0.10562568 -0.3312680 -0.4587345
#[2,] -0.13428822 -0.2050351 0.28374304 -0.0858400 -0.2202396
#[3,] -0.04450277 -0.3259792 0.16493099 0.2393501 -0.2634766
#[4,] 0.12454047 -0.3499992 -0.24270235 0.3411163 0.3891214
#[5,] 0.40695739 0.2034296 -0.05758283 -0.2999763 -0.4682834
请注意,预测例程只是采用现有构造信息而不是重建多项式。
第 6 节:只将 poly
和 predict.poly
视为黑框
很少需要了解里面的一切。对于统计建模,知道 poly
构造模型拟合的多项式基础就足够了,其系数可以在 lmObject$coefficients
中找到。进行预测时,predict.poly
永远不需要用户调用,因为 predict.lm
会为您完成。这样,直接把poly
和predict.poly
当成黑框就完全OK了。