使用 MLR 包调整 randomForest 截止值
Tuning randomForest cutoffs with MLR package
我一直在探索 titanic data set 的奇妙 mlr
包。我的问题是实施随机森林。更具体地说,我想调整 cutoff
(即,将不纯的叶子分配给给定 class 的阈值)。问题是 cutoff
参数有两个值,但是,我只能弄清楚超参数在 mlr
中的单个值。
代码:
library(mlr)
library(dplyr)
dTrain <- read.csv('path/to/data/')
#Defining the Task
trainTask <- makeClassifTask(data = dTrain %>%
select(-Name, -Ticket, -Cabin) %>%
filter(complete.cases(.)),
target = "Survived",
id = "PassengerId")
#Defining Learning
rfLRN <- makeLearner("classif.randomForest")
#Defining the Parameter Space
ps <- makeParamSet(
makeDiscreteParam("cutoff", values = list(c(.5,.5), c(.75,.25)))
)
这就是问题所在,cutoff
需要两个值,但是,我不确定如何传递这两个值。上面的尝试是错误的。我已经尝试了其他几个参数设置器,即 makeDiscreteVectorParam
,等等....但无济于事。有什么建议吗?
如果相反,我尝试调整 mtry
之类的参数(即在给定拆分时 select 的特征数量)一切正常。
#Defining the Hyperparameter Space
ps = makeParamSet(
makeDiscreteParam("mtry", values = c(2,3,4,5))
)
#Defining Resampling
cvTask <- makeResampleDesc("CV", iters=5L)
#Defining Search
search <- makeTuneControlGrid()
#Tune!
tune <- tuneParams(learner = rfLRN
,task = trainTask
,resampling = cvTask
,measures = list(acc)
,par.set = ps
,control = search
,show.info = TRUE)
看来您需要为这些分类截止点指定名称,例如:
#Defining the Parameter Space
ps <- makeParamSet(
makeDiscreteParam("cutoff", values = list(
a=c(.50,.50),
b=c(.75,.25)))
)
输出:
> tune <- tuneParams(learner = rfLRN
+ ,task = trainTask
+ ,resampling = cvTask
+ ,measures = list(acc)
+ ,par.set = ps
+ ,control = search
+ ,show.info = TRUE)
[Tune] Started tuning learner classif.randomForest for parameter set:
Type len Def Constr Req Tunable Trafo
cutoff discrete - - a,b - TRUE -
With control class: TuneControlGrid
Imputation value: -0
[Tune-x] 1: cutoff=a
[Tune-y] 1: acc.test.mean=0.828; time: 0.0 min
[Tune-x] 2: cutoff=b
[Tune-y] 2: acc.test.mean=0.776; time: 0.0 min
[Tune] Result: cutoff=a : acc.test.mean=0.828
我一直在探索 titanic data set 的奇妙 mlr
包。我的问题是实施随机森林。更具体地说,我想调整 cutoff
(即,将不纯的叶子分配给给定 class 的阈值)。问题是 cutoff
参数有两个值,但是,我只能弄清楚超参数在 mlr
中的单个值。
代码:
library(mlr)
library(dplyr)
dTrain <- read.csv('path/to/data/')
#Defining the Task
trainTask <- makeClassifTask(data = dTrain %>%
select(-Name, -Ticket, -Cabin) %>%
filter(complete.cases(.)),
target = "Survived",
id = "PassengerId")
#Defining Learning
rfLRN <- makeLearner("classif.randomForest")
#Defining the Parameter Space
ps <- makeParamSet(
makeDiscreteParam("cutoff", values = list(c(.5,.5), c(.75,.25)))
)
这就是问题所在,cutoff
需要两个值,但是,我不确定如何传递这两个值。上面的尝试是错误的。我已经尝试了其他几个参数设置器,即 makeDiscreteVectorParam
,等等....但无济于事。有什么建议吗?
如果相反,我尝试调整 mtry
之类的参数(即在给定拆分时 select 的特征数量)一切正常。
#Defining the Hyperparameter Space
ps = makeParamSet(
makeDiscreteParam("mtry", values = c(2,3,4,5))
)
#Defining Resampling
cvTask <- makeResampleDesc("CV", iters=5L)
#Defining Search
search <- makeTuneControlGrid()
#Tune!
tune <- tuneParams(learner = rfLRN
,task = trainTask
,resampling = cvTask
,measures = list(acc)
,par.set = ps
,control = search
,show.info = TRUE)
看来您需要为这些分类截止点指定名称,例如:
#Defining the Parameter Space
ps <- makeParamSet(
makeDiscreteParam("cutoff", values = list(
a=c(.50,.50),
b=c(.75,.25)))
)
输出:
> tune <- tuneParams(learner = rfLRN
+ ,task = trainTask
+ ,resampling = cvTask
+ ,measures = list(acc)
+ ,par.set = ps
+ ,control = search
+ ,show.info = TRUE)
[Tune] Started tuning learner classif.randomForest for parameter set:
Type len Def Constr Req Tunable Trafo
cutoff discrete - - a,b - TRUE -
With control class: TuneControlGrid
Imputation value: -0
[Tune-x] 1: cutoff=a
[Tune-y] 1: acc.test.mean=0.828; time: 0.0 min
[Tune-x] 2: cutoff=b
[Tune-y] 2: acc.test.mean=0.776; time: 0.0 min
[Tune] Result: cutoff=a : acc.test.mean=0.828