参数调整后将超参数设置为 mlr 中的学习者
Set hyperparameters to a learner in mlr after parameter tuning
我正在使用 mlr 包在 R 中构建分类任务,以调整我正在使用验证集的超参数,其中一个参数是使用特征选择基于重要性使用的变量百分比(chi.square方法)
lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
makeDiscreteParam("booster",values = c("gbtree","dart")),
makeDiscreteParam("nrounds", values = 1000, tunable = F),
makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
makeIntegerParam("max_depth",lower = 3L,upper = 10L),
makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
makeNumericParam("subsample",lower = 0.5,upper = 1),
makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
rdesc = makeResampleDesc("CV", iters = 5)
ctrl <- makeTuneControlRandom(maxit = 1L)
res = tuneParams(lrn, task = valTask2016, resampling = rdesc, par.set = params, control = ctrl)
我不确定我是否需要在这里进行 5 折交叉验证,但是变量 res
为我提供了我需要的所有参数,包括 fw.perc
它将修剪我的按重要性降序排列的变量选择。
我的问题是,如何使用这些参数再次使用重采样(这次使用 Subsampling
)但这次是在训练数据上?这是我得到的:
rdesc = makeResampleDesc("Subsample", iters = 5, split = 0.8)
lrn = setHyperPars(makeLearner("classif.xgboost"), par.vals = res$x)
r = resample(lrn, trainTask2016, rdesc, measures = list(mmce, fpr, fnr, timetrain))
在这种情况下,valTask2016
是我用于验证参数的分类任务。我使用 createDummyFeatures
进行了 XGBoost 所需的单热编码。
这是我得到的错误:
Error in setHyperPars2.Learner(learner, insert(par.vals, args)) :
classif.xgboost: Setting parameter fw.perc without available description object!
Did you mean one of these hyperparameters instead: booster eta alpha
我相信你得到这个错误的原因是第二个学习器是一个 "simple" xgboost 学习器,而不是一个被过滤器包裹的 xgboost 学习器,就像你的第一个学习器 (learnermakeFilterWrapper returns a学习者)。
所以,你有两个选择:
- 您在第二次训练中定义了一个新的参数集,其中您 "copy" 只有 res$x 中引用 xgboost 的部分,即没有 fw.perc
- 你用同一个过滤器包装你的第二个 xgboost 学习器
我希望这是有道理的。
编辑:这对我使用泰坦尼克号数据集的第二个选项有用:
library(mlr)
library(dplyr)
library(titanic)
sample <- sample.int(n = nrow(titanic_train), size = floor(.7*nrow(titanic_train)), replace = F)
train <- titanic_train[sample, ] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
makeDiscreteParam("booster",values = c("gbtree","dart")),
makeDiscreteParam("nrounds", values = 1000, tunable = F),
makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
makeIntegerParam("max_depth",lower = 3L,upper = 10L),
makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
makeNumericParam("subsample",lower = 0.5,upper = 1),
makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
classif.task <- mlr::makeClassifTask(data = train,
target = "Survived",
positive = "1")
rdesc = makeResampleDesc("CV", iters = 3)
ctrl <- makeTuneControlRandom(maxit = 2L)
res = tuneParams(lrn, task = classif.task, resampling = rdesc, par.set = params, control = ctrl)
##########################
test <- titanic_train[-sample,] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
lrn2 = setHyperPars(makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared"), par.vals = res$x)
classif.task2 <- mlr::makeClassifTask(data = test,
target = "Survived",
positive = "1")
rdesc = makeResampleDesc("CV", iters = 3)
r = resample(learner = lrn2, task = classif.task2, resampling = rdesc, show.info = T, models = TRUE)
我正在使用 mlr 包在 R 中构建分类任务,以调整我正在使用验证集的超参数,其中一个参数是使用特征选择基于重要性使用的变量百分比(chi.square方法)
lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
makeDiscreteParam("booster",values = c("gbtree","dart")),
makeDiscreteParam("nrounds", values = 1000, tunable = F),
makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
makeIntegerParam("max_depth",lower = 3L,upper = 10L),
makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
makeNumericParam("subsample",lower = 0.5,upper = 1),
makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
rdesc = makeResampleDesc("CV", iters = 5)
ctrl <- makeTuneControlRandom(maxit = 1L)
res = tuneParams(lrn, task = valTask2016, resampling = rdesc, par.set = params, control = ctrl)
我不确定我是否需要在这里进行 5 折交叉验证,但是变量 res
为我提供了我需要的所有参数,包括 fw.perc
它将修剪我的按重要性降序排列的变量选择。
我的问题是,如何使用这些参数再次使用重采样(这次使用 Subsampling
)但这次是在训练数据上?这是我得到的:
rdesc = makeResampleDesc("Subsample", iters = 5, split = 0.8)
lrn = setHyperPars(makeLearner("classif.xgboost"), par.vals = res$x)
r = resample(lrn, trainTask2016, rdesc, measures = list(mmce, fpr, fnr, timetrain))
在这种情况下,valTask2016
是我用于验证参数的分类任务。我使用 createDummyFeatures
进行了 XGBoost 所需的单热编码。
这是我得到的错误:
Error in setHyperPars2.Learner(learner, insert(par.vals, args)) : classif.xgboost: Setting parameter fw.perc without available description object! Did you mean one of these hyperparameters instead: booster eta alpha
我相信你得到这个错误的原因是第二个学习器是一个 "simple" xgboost 学习器,而不是一个被过滤器包裹的 xgboost 学习器,就像你的第一个学习器 (learnermakeFilterWrapper returns a学习者)。
所以,你有两个选择:
- 您在第二次训练中定义了一个新的参数集,其中您 "copy" 只有 res$x 中引用 xgboost 的部分,即没有 fw.perc
- 你用同一个过滤器包装你的第二个 xgboost 学习器
我希望这是有道理的。
编辑:这对我使用泰坦尼克号数据集的第二个选项有用:
library(mlr)
library(dplyr)
library(titanic)
sample <- sample.int(n = nrow(titanic_train), size = floor(.7*nrow(titanic_train)), replace = F)
train <- titanic_train[sample, ] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
lrn = makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared")
params <- makeParamSet(
makeDiscreteParam("booster",values = c("gbtree","dart")),
makeDiscreteParam("nrounds", values = 1000, tunable = F),
makeDiscreteParam("eta", values = c(0.1,0.05,0.2)),
makeIntegerParam("max_depth",lower = 3L,upper = 10L),
makeNumericParam("min_child_weight",lower = 1L,upper = 10L),
makeNumericParam("subsample",lower = 0.5,upper = 1),
makeNumericParam("colsample_bytree",lower = 0.5,upper = 1),
makeDiscreteParam("fw.perc", values = seq(0.2, 1, 0.05)))
classif.task <- mlr::makeClassifTask(data = train,
target = "Survived",
positive = "1")
rdesc = makeResampleDesc("CV", iters = 3)
ctrl <- makeTuneControlRandom(maxit = 2L)
res = tuneParams(lrn, task = classif.task, resampling = rdesc, par.set = params, control = ctrl)
##########################
test <- titanic_train[-sample,] %>% select(Pclass, Sex, Age, SibSp, Fare, Survived) %>% mutate(Sex = ifelse(Sex == 'male', 0, 1))
lrn2 = setHyperPars(makeFilterWrapper(learner = "classif.xgboost", fw.method = "chi.squared"), par.vals = res$x)
classif.task2 <- mlr::makeClassifTask(data = test,
target = "Survived",
positive = "1")
rdesc = makeResampleDesc("CV", iters = 3)
r = resample(learner = lrn2, task = classif.task2, resampling = rdesc, show.info = T, models = TRUE)