如何使用 caret-GBM interpret/tune 多项式分类?
How to interpret/tune a multinomial classification with caret-GBM?
两个问题
- 可视化模型的错误
- 计算对数损失
(1) 我正在尝试调整多项式 GBM 分类器,但我不确定如何适应输出。我知道 LogLoss 是为了最小化,但在下图中,对于任何范围的迭代或树,它似乎只会增加。
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = (1:5)*2, .n.trees = (1:10)*25, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, metric = "ROC", tuneGrid = gbmGrid1)
plot(gbmFit1)
--
(2) 在相关说明中,当我尝试直接调查 mnLogLoss 时出现此错误,这使我无法尝试量化错误。
mnLogLoss(testing, levels(testing$label)) : 'lev' cannot be NULL
我怀疑你把学习率设置得太高了。所以使用示例数据集:
final_data = iris
final_data$label=final_data$Species
final_data$Species=NULL
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3,
verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
与你的有点不同,但你可以看到 20 之后的上升趋势。这真的取决于你的数据,但如果你有很高的学习率,你会很快到达最低点,之后的任何东西都会引入噪音。你可以从Boehmke's book and also check out a more statistics based discussion.
看到这个插图
让我们降低学习率,你可以看到:
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.01, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
请注意,您很可能需要更多迭代才能达到更低的损失,就像您在第一次看到的那样。
两个问题
- 可视化模型的错误
- 计算对数损失
(1) 我正在尝试调整多项式 GBM 分类器,但我不确定如何适应输出。我知道 LogLoss 是为了最小化,但在下图中,对于任何范围的迭代或树,它似乎只会增加。
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3, verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = (1:5)*2, .n.trees = (1:10)*25, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, metric = "ROC", tuneGrid = gbmGrid1)
plot(gbmFit1)
-- (2) 在相关说明中,当我尝试直接调查 mnLogLoss 时出现此错误,这使我无法尝试量化错误。
mnLogLoss(testing, levels(testing$label)) : 'lev' cannot be NULL
我怀疑你把学习率设置得太高了。所以使用示例数据集:
final_data = iris
final_data$label=final_data$Species
final_data$Species=NULL
inTraining <- createDataPartition(final_data$label, p = 0.80, list = FALSE)
training <- final_data[inTraining,]
testing <- final_data[-inTraining,]
fitControl <- trainControl(method = "repeatedcv", number=10, repeats=3,
verboseIter = FALSE, savePredictions = TRUE, classProbs = TRUE, summaryFunction= mnLogLoss)
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.1, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
与你的有点不同,但你可以看到 20 之后的上升趋势。这真的取决于你的数据,但如果你有很高的学习率,你会很快到达最低点,之后的任何东西都会引入噪音。你可以从Boehmke's book and also check out a more statistics based discussion.
看到这个插图让我们降低学习率,你可以看到:
gbmGrid1 <- expand.grid(.interaction.depth = 1:3, .n.trees = (1:10)*10, .shrinkage = 0.01, .n.minobsinnode = 10)
gbmFit1 <- train(label~., data = training, method = "gbm", trControl=fitControl,
verbose = 1, tuneGrid = gbmGrid1,metric="logLoss")
plot(gbmFit1)
请注意,您很可能需要更多迭代才能达到更低的损失,就像您在第一次看到的那样。