嵌套重采样 + LASSO (regr.cvglment) 使用 mlr
Nested resampling + LASSO (regr.cvglment) using mlr
我正在尝试使用 regr.cvglment 对内部循环使用 10 个 CV,对外部循环使用 10 个 CV 进行嵌套重采样。 Mlr 使用包装函数提供代码 (https://mlr-org.github.io/mlr/articles/tutorial/devel/nested_resampling.html)
现在,我只是从他们提供的代码中交换了两件事
1) "regr.cvglmnet" 而不是支持向量机 (ksvm)
2) 内循环和外循环的迭代次数
在 lrn 函数之后,我得到了下面指定的错误。有人可以向我解释一下吗?我对编码和机器学习完全陌生,所以我可能在代码中做了一些非常愚蠢的事情....
ps = makeParamSet(
makeDiscreteParam("C", values = 2^(-12:12)),
makeDiscreteParam("sigma", values = 2^(-12:12))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Subsample", iters = 10)
lrn = makeTuneWrapper("regr.cvglmnet", resampling = inner, par.set = ps,
control = ctrl, show.info = FALSE)
# Error in checkTunerParset(learner, par.set, measures, control) :
# Can only tune parameters for which learner parameters exist: C,sigma
### Outer resampling loop
outer = makeResampleDesc("CV", iters = 10)
r = resample(lrn, iris.task, resampling = outer, extract = getTuneResult,
show.info = FALSE)
错误消息告诉您您无法为该学习者调整 mlr 不知道的参数 -- regr.cvglmnet
没有 C
和 sigma
参数。您可以使用 getLearnerParamSet()
函数获取学习者 mlr 了解的参数:
> getLearnerParamSet(makeLearner("regr.cvglmnet"))
Type len Def Constr Req
family discrete - gaussian gaussian,poisson -
alpha numeric - 1 0 to 1 -
nfolds integer - 10 3 to Inf -
type.measure discrete - mse mse,mae -
s discrete - lambda.1se lambda.1se,lambda.min -
nlambda integer - 100 1 to Inf -
lambda.min.ratio numeric - - 0 to 1 -
standardize logical - TRUE - -
intercept logical - TRUE - -
thresh numeric - 1e-07 0 to Inf -
dfmax integer - - 0 to Inf -
pmax integer - - 0 to Inf -
exclude integervector - 1 to Inf -
penalty.factor numericvector - 0 to 1 -
lower.limits numericvector - -Inf to 0 -
upper.limits numericvector - 0 to Inf -
maxit integer - 100000 1 to Inf -
type.gaussian discrete - - covariance,naive -
fdev numeric - 1e-05 0 to 1 -
devmax numeric - 0.999 0 to 1 -
eps numeric - 1e-06 0 to 1 -
big numeric - 9.9e+35 -Inf to Inf -
mnlam integer - 5 1 to Inf -
pmin numeric - 1e-09 0 to 1 -
exmx numeric - 250 -Inf to Inf -
prec numeric - 1e-10 -Inf to Inf -
mxit integer - 100 1 to Inf -
Tunable Trafo
family TRUE -
alpha TRUE -
nfolds TRUE -
type.measure TRUE -
s TRUE -
nlambda TRUE -
lambda.min.ratio TRUE -
standardize TRUE -
intercept TRUE -
thresh TRUE -
dfmax TRUE -
pmax TRUE -
exclude TRUE -
penalty.factor TRUE -
lower.limits TRUE -
upper.limits TRUE -
maxit TRUE -
type.gaussian TRUE -
fdev TRUE -
devmax TRUE -
eps TRUE -
big TRUE -
mnlam TRUE -
pmin TRUE -
exmx TRUE -
prec TRUE -
mxit TRUE -
您可以使用这些参数中的任何一个来定义一个有效的参数集来调整这个特定的学习器,例如:
ps = makeParamSet(
makeDiscreteParam("family", values = c("gaussian", "poisson")),
makeDiscreteParam("alpha", values = 0.1*0:10)
)
将 LASSO 与 glmnet
一起使用时,您只需调整 s
。这是模型预测新数据时使用的重要参数。
由于包的编码方式对预测的影响,参数 lambda
绝对没有影响。如果您将 s
设置为不同于已选择的任何 lambda
值,则模型将使用 s
作为惩罚项进行重新拟合。
默认情况下,在 train
调用期间会安装多个具有不同 lambda
值的模型。但是,对于预测,将使用最佳 lambda
值拟合新模型。所以实际上调整是在预测步骤中完成的。
s
的良好默认范围可以由
选择
- 使用
glmnet
中的默认值训练模型
- 检查
lambda
的最小值和最大值
- 将这些用作
s
的下限和上限,然后使用 mlr
对其进行调整
另请参阅 this 讨论。
library(mlr)
#> Loading required package: ParamHelpers
lrn_glmnet <- makeLearner("regr.glmnet",
alpha = 1,
intercept = FALSE)
# check lambda
glmnet_train = mlr::train(lrn_glmnet, bh.task)
summary(glmnet_train$learner.model$lambda)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 143.5 157.4 172.8 174.3 189.6 208.1
# set limits
ps_glmnet <- makeParamSet(makeNumericParam("s", lower = 140, upper = 208))
# tune params in parallel using a grid search for simplicity
tune.ctrl = makeTuneControlGrid()
inner <- makeResampleDesc("CV", iters = 2)
configureMlr(on.learner.error = "warn", on.error.dump = TRUE)
library(parallelMap)
parallelStart(mode = "multicore", level = "mlr.tuneParams", cpus = 4,
mc.set.seed = TRUE) # only parallelize the tuning
#> Starting parallelization in mode=multicore with cpus=4.
set.seed(12345)
params_tuned_glmnet = tuneParams(lrn_glmnet, task = bh.task, resampling = inner,
par.set = ps_glmnet, control = tune.ctrl,
measure = list(rmse))
#> [Tune] Started tuning learner regr.glmnet for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> s numeric - - 140 to 208 - TRUE -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> Mapping in parallel: mode = multicore; cpus = 4; elements = 10.
#> [Tune] Result: s=140 : rmse.test.rmse=17.9803086
parallelStop()
#> Stopped parallelization. All cleaned up.
# train the model on the whole dataset using the `s` value from the tuning
lrn_glmnet_tuned <- makeLearner("regr.glmnet",
alpha = 1,
s = 140,
intercept = FALSE)
#lambda = sort(seq(0, 5, length.out = 100), decreasing = T))
glmnet_train_tuned = mlr::train(lrn_glmnet_tuned, bh.task)
由 reprex package (v0.2.0) 创建于 2018-07-03。
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Europe/Berlin
#> date 2018-07-03
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.0 2018-06-04 local
#> BBmisc 1.11 2017-03-10 CRAN (R 3.5.0)
#> bit 1.1-14 2018-05-29 cran (@1.1-14)
#> bit64 0.9-7 2017-05-08 CRAN (R 3.5.0)
#> blob 1.1.1 2018-03-25 CRAN (R 3.5.0)
#> checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
#> codetools 0.2-15 2016-10-05 CRAN (R 3.5.0)
#> colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
#> compiler 3.5.0 2018-06-04 local
#> data.table 1.11.4 2018-05-27 CRAN (R 3.5.0)
#> datasets * 3.5.0 2018-06-04 local
#> DBI 1.0.0 2018-05-02 cran (@1.0.0)
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
#> fastmatch 1.1-0 2017-01-28 CRAN (R 3.5.0)
#> foreach 1.4.4 2017-12-12 CRAN (R 3.5.0)
#> ggplot2 2.2.1 2016-12-30 CRAN (R 3.5.0)
#> git2r 0.21.0 2018-01-04 CRAN (R 3.5.0)
#> glmnet 2.0-16 2018-04-02 CRAN (R 3.5.0)
#> graphics * 3.5.0 2018-06-04 local
#> grDevices * 3.5.0 2018-06-04 local
#> grid 3.5.0 2018-06-04 local
#> gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> iterators 1.0.9 2017-12-12 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> lattice 0.20-35 2017-03-25 CRAN (R 3.5.0)
#> lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> Matrix 1.2-14 2018-04-09 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> memuse 4.0-0 2017-11-10 CRAN (R 3.5.0)
#> methods * 3.5.0 2018-06-04 local
#> mlr * 2.13 2018-07-01 local
#> munsell 0.5.0 2018-06-12 CRAN (R 3.5.0)
#> parallel 3.5.0 2018-06-04 local
#> parallelMap * 1.3 2015-06-10 CRAN (R 3.5.0)
#> ParamHelpers * 1.11 2018-06-25 CRAN (R 3.5.0)
#> pillar 1.2.3 2018-05-25 CRAN (R 3.5.0)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
#> Rcpp 0.12.17 2018-05-18 cran (@0.12.17)
#> rlang 0.2.1 2018-05-30 CRAN (R 3.5.0)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> RSQLite 2.1.1 2018-05-06 cran (@2.1.1)
#> scales 0.5.0 2017-08-24 CRAN (R 3.5.0)
#> splines 3.5.0 2018-06-04 local
#> stats * 3.5.0 2018-06-04 local
#> stringi 1.2.3 2018-06-12 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> survival 2.42-3 2018-04-16 CRAN (R 3.5.0)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
#> tools 3.5.0 2018-06-04 local
#> utils * 3.5.0 2018-06-04 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> XML 3.98-1.11 2018-04-16 CRAN (R 3.5.0)
#> yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)
我正在尝试使用 regr.cvglment 对内部循环使用 10 个 CV,对外部循环使用 10 个 CV 进行嵌套重采样。 Mlr 使用包装函数提供代码 (https://mlr-org.github.io/mlr/articles/tutorial/devel/nested_resampling.html)
现在,我只是从他们提供的代码中交换了两件事 1) "regr.cvglmnet" 而不是支持向量机 (ksvm) 2) 内循环和外循环的迭代次数
在 lrn 函数之后,我得到了下面指定的错误。有人可以向我解释一下吗?我对编码和机器学习完全陌生,所以我可能在代码中做了一些非常愚蠢的事情....
ps = makeParamSet(
makeDiscreteParam("C", values = 2^(-12:12)),
makeDiscreteParam("sigma", values = 2^(-12:12))
)
ctrl = makeTuneControlGrid()
inner = makeResampleDesc("Subsample", iters = 10)
lrn = makeTuneWrapper("regr.cvglmnet", resampling = inner, par.set = ps,
control = ctrl, show.info = FALSE)
# Error in checkTunerParset(learner, par.set, measures, control) :
# Can only tune parameters for which learner parameters exist: C,sigma
### Outer resampling loop
outer = makeResampleDesc("CV", iters = 10)
r = resample(lrn, iris.task, resampling = outer, extract = getTuneResult,
show.info = FALSE)
错误消息告诉您您无法为该学习者调整 mlr 不知道的参数 -- regr.cvglmnet
没有 C
和 sigma
参数。您可以使用 getLearnerParamSet()
函数获取学习者 mlr 了解的参数:
> getLearnerParamSet(makeLearner("regr.cvglmnet")) Type len Def Constr Req family discrete - gaussian gaussian,poisson - alpha numeric - 1 0 to 1 - nfolds integer - 10 3 to Inf - type.measure discrete - mse mse,mae - s discrete - lambda.1se lambda.1se,lambda.min - nlambda integer - 100 1 to Inf - lambda.min.ratio numeric - - 0 to 1 - standardize logical - TRUE - - intercept logical - TRUE - - thresh numeric - 1e-07 0 to Inf - dfmax integer - - 0 to Inf - pmax integer - - 0 to Inf - exclude integervector - 1 to Inf - penalty.factor numericvector - 0 to 1 - lower.limits numericvector - -Inf to 0 - upper.limits numericvector - 0 to Inf - maxit integer - 100000 1 to Inf - type.gaussian discrete - - covariance,naive - fdev numeric - 1e-05 0 to 1 - devmax numeric - 0.999 0 to 1 - eps numeric - 1e-06 0 to 1 - big numeric - 9.9e+35 -Inf to Inf - mnlam integer - 5 1 to Inf - pmin numeric - 1e-09 0 to 1 - exmx numeric - 250 -Inf to Inf - prec numeric - 1e-10 -Inf to Inf - mxit integer - 100 1 to Inf - Tunable Trafo family TRUE - alpha TRUE - nfolds TRUE - type.measure TRUE - s TRUE - nlambda TRUE - lambda.min.ratio TRUE - standardize TRUE - intercept TRUE - thresh TRUE - dfmax TRUE - pmax TRUE - exclude TRUE - penalty.factor TRUE - lower.limits TRUE - upper.limits TRUE - maxit TRUE - type.gaussian TRUE - fdev TRUE - devmax TRUE - eps TRUE - big TRUE - mnlam TRUE - pmin TRUE - exmx TRUE - prec TRUE - mxit TRUE -
您可以使用这些参数中的任何一个来定义一个有效的参数集来调整这个特定的学习器,例如:
ps = makeParamSet( makeDiscreteParam("family", values = c("gaussian", "poisson")), makeDiscreteParam("alpha", values = 0.1*0:10) )
将 LASSO 与 glmnet
一起使用时,您只需调整 s
。这是模型预测新数据时使用的重要参数。
由于包的编码方式对预测的影响,参数 lambda
绝对没有影响。如果您将 s
设置为不同于已选择的任何 lambda
值,则模型将使用 s
作为惩罚项进行重新拟合。
默认情况下,在 train
调用期间会安装多个具有不同 lambda
值的模型。但是,对于预测,将使用最佳 lambda
值拟合新模型。所以实际上调整是在预测步骤中完成的。
s
的良好默认范围可以由
- 使用
glmnet
中的默认值训练模型
- 检查
lambda
的最小值和最大值
- 将这些用作
s
的下限和上限,然后使用mlr
对其进行调整
另请参阅 this 讨论。
library(mlr)
#> Loading required package: ParamHelpers
lrn_glmnet <- makeLearner("regr.glmnet",
alpha = 1,
intercept = FALSE)
# check lambda
glmnet_train = mlr::train(lrn_glmnet, bh.task)
summary(glmnet_train$learner.model$lambda)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 143.5 157.4 172.8 174.3 189.6 208.1
# set limits
ps_glmnet <- makeParamSet(makeNumericParam("s", lower = 140, upper = 208))
# tune params in parallel using a grid search for simplicity
tune.ctrl = makeTuneControlGrid()
inner <- makeResampleDesc("CV", iters = 2)
configureMlr(on.learner.error = "warn", on.error.dump = TRUE)
library(parallelMap)
parallelStart(mode = "multicore", level = "mlr.tuneParams", cpus = 4,
mc.set.seed = TRUE) # only parallelize the tuning
#> Starting parallelization in mode=multicore with cpus=4.
set.seed(12345)
params_tuned_glmnet = tuneParams(lrn_glmnet, task = bh.task, resampling = inner,
par.set = ps_glmnet, control = tune.ctrl,
measure = list(rmse))
#> [Tune] Started tuning learner regr.glmnet for parameter set:
#> Type len Def Constr Req Tunable Trafo
#> s numeric - - 140 to 208 - TRUE -
#> With control class: TuneControlGrid
#> Imputation value: Inf
#> Mapping in parallel: mode = multicore; cpus = 4; elements = 10.
#> [Tune] Result: s=140 : rmse.test.rmse=17.9803086
parallelStop()
#> Stopped parallelization. All cleaned up.
# train the model on the whole dataset using the `s` value from the tuning
lrn_glmnet_tuned <- makeLearner("regr.glmnet",
alpha = 1,
s = 140,
intercept = FALSE)
#lambda = sort(seq(0, 5, length.out = 100), decreasing = T))
glmnet_train_tuned = mlr::train(lrn_glmnet_tuned, bh.task)
由 reprex package (v0.2.0) 创建于 2018-07-03。
devtools::session_info()
#> Session info -------------------------------------------------------------
#> setting value
#> version R version 3.5.0 (2018-04-23)
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> tz Europe/Berlin
#> date 2018-07-03
#> Packages -----------------------------------------------------------------
#> package * version date source
#> backports 1.1.2 2017-12-13 CRAN (R 3.5.0)
#> base * 3.5.0 2018-06-04 local
#> BBmisc 1.11 2017-03-10 CRAN (R 3.5.0)
#> bit 1.1-14 2018-05-29 cran (@1.1-14)
#> bit64 0.9-7 2017-05-08 CRAN (R 3.5.0)
#> blob 1.1.1 2018-03-25 CRAN (R 3.5.0)
#> checkmate 1.8.5 2017-10-24 CRAN (R 3.5.0)
#> codetools 0.2-15 2016-10-05 CRAN (R 3.5.0)
#> colorspace 1.3-2 2016-12-14 CRAN (R 3.5.0)
#> compiler 3.5.0 2018-06-04 local
#> data.table 1.11.4 2018-05-27 CRAN (R 3.5.0)
#> datasets * 3.5.0 2018-06-04 local
#> DBI 1.0.0 2018-05-02 cran (@1.0.0)
#> devtools 1.13.6 2018-06-27 CRAN (R 3.5.0)
#> digest 0.6.15 2018-01-28 CRAN (R 3.5.0)
#> evaluate 0.10.1 2017-06-24 CRAN (R 3.5.0)
#> fastmatch 1.1-0 2017-01-28 CRAN (R 3.5.0)
#> foreach 1.4.4 2017-12-12 CRAN (R 3.5.0)
#> ggplot2 2.2.1 2016-12-30 CRAN (R 3.5.0)
#> git2r 0.21.0 2018-01-04 CRAN (R 3.5.0)
#> glmnet 2.0-16 2018-04-02 CRAN (R 3.5.0)
#> graphics * 3.5.0 2018-06-04 local
#> grDevices * 3.5.0 2018-06-04 local
#> grid 3.5.0 2018-06-04 local
#> gtable 0.2.0 2016-02-26 CRAN (R 3.5.0)
#> htmltools 0.3.6 2017-04-28 CRAN (R 3.5.0)
#> iterators 1.0.9 2017-12-12 CRAN (R 3.5.0)
#> knitr 1.20 2018-02-20 CRAN (R 3.5.0)
#> lattice 0.20-35 2017-03-25 CRAN (R 3.5.0)
#> lazyeval 0.2.1 2017-10-29 CRAN (R 3.5.0)
#> magrittr 1.5 2014-11-22 CRAN (R 3.5.0)
#> Matrix 1.2-14 2018-04-09 CRAN (R 3.5.0)
#> memoise 1.1.0 2017-04-21 CRAN (R 3.5.0)
#> memuse 4.0-0 2017-11-10 CRAN (R 3.5.0)
#> methods * 3.5.0 2018-06-04 local
#> mlr * 2.13 2018-07-01 local
#> munsell 0.5.0 2018-06-12 CRAN (R 3.5.0)
#> parallel 3.5.0 2018-06-04 local
#> parallelMap * 1.3 2015-06-10 CRAN (R 3.5.0)
#> ParamHelpers * 1.11 2018-06-25 CRAN (R 3.5.0)
#> pillar 1.2.3 2018-05-25 CRAN (R 3.5.0)
#> plyr 1.8.4 2016-06-08 CRAN (R 3.5.0)
#> Rcpp 0.12.17 2018-05-18 cran (@0.12.17)
#> rlang 0.2.1 2018-05-30 CRAN (R 3.5.0)
#> rmarkdown 1.10 2018-06-11 CRAN (R 3.5.0)
#> rprojroot 1.3-2 2018-01-03 CRAN (R 3.5.0)
#> RSQLite 2.1.1 2018-05-06 cran (@2.1.1)
#> scales 0.5.0 2017-08-24 CRAN (R 3.5.0)
#> splines 3.5.0 2018-06-04 local
#> stats * 3.5.0 2018-06-04 local
#> stringi 1.2.3 2018-06-12 CRAN (R 3.5.0)
#> stringr 1.3.1 2018-05-10 CRAN (R 3.5.0)
#> survival 2.42-3 2018-04-16 CRAN (R 3.5.0)
#> tibble 1.4.2 2018-01-22 CRAN (R 3.5.0)
#> tools 3.5.0 2018-06-04 local
#> utils * 3.5.0 2018-06-04 local
#> withr 2.1.2 2018-03-15 CRAN (R 3.5.0)
#> XML 3.98-1.11 2018-04-16 CRAN (R 3.5.0)
#> yaml 2.1.19 2018-05-01 CRAN (R 3.5.0)