为什么带 R [pls] 的 PLS 回归系数与其他 R 包的不同?
Why do PLS regression coefficients with R [pls] differ with those from other R packages?
出于好奇,我想弄清楚为什么用 pls
获得的 PLS 回归系数与用 plsRglm
、ropls
或 [=14= 获得的系数不同] 都提供相同的结果。
这里有一些代码可以开始。我尝试使用 plsr 函数的比例、中心和方法参数...但到目前为止没有成功。
library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)
data(Cornell)
pls.plsr <- plsr(
Y~X1+X2+X3+X4+X5+X6+X7,
data = Cornell,
ncomp = 3,
scale = TRUE,
center = TRUE
)
plsRglm.plsr <- plsR(
Y~X1+X2+X3+X4+X5+X6+X7,
data = Cornell,
nt = 3,
scaleX = TRUE
)
ropls.plsr <- opls(
as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"],
scaleC = "standard"
)
plsdepot.plsr <- plsreg1(
as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"],
comps = 3
)
## extract PLS regression coefficients for the PLS model with three components
coef(pls.plsr) # a
coef(plsRglm.plsr, type = "original") # b
coef(plsRglm.plsr, type = "scaled") # c
coef(ropls.plsr) # c
plsdepot.plsr$std.coefs # c
plsdepot.plsr$reg.coefs # b
首先,为了重新格式化,我们写:
library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)
data(Cornell)
pls.plsr <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7,
data = Cornell,
ncomp = 3, scale = T, center = T)
plsRglm.plsr <- plsR(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7,
data = Cornell,
nt = 3, scaleX = TRUE)
ropls.plsr <- opls(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"], scaleC = "standard")
plsdepot.plsr <- plsreg1(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"], comps = 3)
完成后,您可以提取原始比例中的系数:
### ORIGINAL SCALE - plsRglm, plsdepot
coef(plsRglm.plsr, type = "original")
plsdepot.plsr$reg.coefs
或者您可以缩放它们:
### SCALED - plsRglm, ropls, plsdepot
coef(plsRglm.plsr, type = "scaled")
coef(ropls.plsr)
plsdepot.plsr$std.coefs
因此,所有方法现在都会产生相同的系数...除了 pls::plsr。为什么?你可能会问。关键在命令中。当你 运行:
coef(pls.plsr) # , , 3 comps
您看到“, , 3”。这是张量对象的特征。这是什么?系数应该只是一个向量。原因是 coef 是一个通用函数,它不能正常用于 pls::plsr 模型。查看它实际提取的内容:
pls.plsr$coefficients
matrix(pls.plsr$coefficients, ncol = 3) # or in matrix form. coef simply extracts the third column (it should not)
但是如果您检查每个 R 包中的等效对象,您会发现所有模型都具有相同的拟合度,如下所示:
matrix(pls.plsr$projection, ncol = 3)
plsRglm.plsr$wwetoile
plsdepot.plsr$mod.wgs
ropls.plsr@weightStarMN
因此,对于 pls::plsr,您根本没有提取系数。
出于好奇,我想弄清楚为什么用 pls
获得的 PLS 回归系数与用 plsRglm
、ropls
或 [=14= 获得的系数不同] 都提供相同的结果。
这里有一些代码可以开始。我尝试使用 plsr 函数的比例、中心和方法参数...但到目前为止没有成功。
library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)
data(Cornell)
pls.plsr <- plsr(
Y~X1+X2+X3+X4+X5+X6+X7,
data = Cornell,
ncomp = 3,
scale = TRUE,
center = TRUE
)
plsRglm.plsr <- plsR(
Y~X1+X2+X3+X4+X5+X6+X7,
data = Cornell,
nt = 3,
scaleX = TRUE
)
ropls.plsr <- opls(
as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"],
scaleC = "standard"
)
plsdepot.plsr <- plsreg1(
as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"],
comps = 3
)
## extract PLS regression coefficients for the PLS model with three components
coef(pls.plsr) # a
coef(plsRglm.plsr, type = "original") # b
coef(plsRglm.plsr, type = "scaled") # c
coef(ropls.plsr) # c
plsdepot.plsr$std.coefs # c
plsdepot.plsr$reg.coefs # b
首先,为了重新格式化,我们写:
library(pls)
library(plsRglm)
library(ropls)
library(plsdepot)
data(Cornell)
pls.plsr <- plsr(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7,
data = Cornell,
ncomp = 3, scale = T, center = T)
plsRglm.plsr <- plsR(Y ~ X1 + X2 + X3 + X4 + X5 + X6 + X7,
data = Cornell,
nt = 3, scaleX = TRUE)
ropls.plsr <- opls(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"], scaleC = "standard")
plsdepot.plsr <- plsreg1(as.matrix(Cornell[, grep("X", colnames(Cornell))]),
Cornell[, "Y"], comps = 3)
完成后,您可以提取原始比例中的系数:
### ORIGINAL SCALE - plsRglm, plsdepot
coef(plsRglm.plsr, type = "original")
plsdepot.plsr$reg.coefs
或者您可以缩放它们:
### SCALED - plsRglm, ropls, plsdepot
coef(plsRglm.plsr, type = "scaled")
coef(ropls.plsr)
plsdepot.plsr$std.coefs
因此,所有方法现在都会产生相同的系数...除了 pls::plsr。为什么?你可能会问。关键在命令中。当你 运行:
coef(pls.plsr) # , , 3 comps
您看到“, , 3”。这是张量对象的特征。这是什么?系数应该只是一个向量。原因是 coef 是一个通用函数,它不能正常用于 pls::plsr 模型。查看它实际提取的内容:
pls.plsr$coefficients
matrix(pls.plsr$coefficients, ncol = 3) # or in matrix form. coef simply extracts the third column (it should not)
但是如果您检查每个 R 包中的等效对象,您会发现所有模型都具有相同的拟合度,如下所示:
matrix(pls.plsr$projection, ncol = 3)
plsRglm.plsr$wwetoile
plsdepot.plsr$mod.wgs
ropls.plsr@weightStarMN
因此,对于 pls::plsr,您根本没有提取系数。