使用 Hyperopt 的 XGBoost。超参数调整时遇到的问题

Question

我正在尝试使用 Hyperopt 对 XGBoostClassifier 进行超参数调整。但我面临一个错误。请在下面找到我正在使用的代码以及错误：-

Step_1: Objective 函数

import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
    """Objective function for XGBoost Hyperparameter Optimization"""
    # Keep track of evals
    global ITERATION
    ITERATION += 1
#     # Retrieve the subsample if present otherwise set to 1.0
#     subsample = params['boosting_type'].get('subsample', 1.0)
#     # Extract the boosting type
#     params['boosting_type'] = params['boosting_type']['boosting_type']
#     params['subsample'] = subsample
    # Make sure parameters that need to be integers are integers
    for parameter_name in ['max_depth', 'colsample_bytree', 
                          'min_child_weight']:
        params[parameter_name] = int(params[parameter_name])
    start = timer()
    # Perform n_folds cross validation
    cv_results = xgb.cv(params, train_set, num_boost_round = 10000, 
                       nfold = n_folds, early_stopping_rounds = 100, 
                       metrics = 'auc', seed = 50)
    run_time = timer() - start
    # Extract the best score
    best_score = np.max(cv_results['auc-mean'])
    # Loss must be minimized
    loss = 1 - best_score
    # Boosting rounds that returned the highest cv score
    n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
    # Write to the csv file ('a' means append)
    of_connection = open(out_file, 'a')
    writer = csv.writer(of_connection)
    writer.writerow([loss, params, ITERATION, n_estimators, 
                   run_time])
    # Dictionary with information for evaluation
    return {'loss': loss, 'params': params, 'iteration': ITERATION,
           'estimators': n_estimators, 'train_time': run_time, 
           'status': STATUS_OK}

我已经定义了示例 space 和优化算法。运行 Hyperopt 时，我在下面遇到此错误。错误在 objective 函数中。

错误：按键错误：'auc-mean'

<ipython-input-62-8d4e97f16929> in objective(params, n_folds)
     25     run_time = timer() - start
     26     # Extract the best score
---> 27     best_score = np.max(cv_results['auc-mean'])
     28     # Loss must be minimized
     29     loss = 1 - best_score

Answer 1

首先，打印 cv_results 并查看存在哪个键。

在下面的示例笔记本中，键是：'test-auc-mean' 和 'train-auc-mean'

在此处查看单元格 5： https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters

Answer 2

@avvinci 是正确的。让我再解释一下。

cv_results = xgb.cv(params, train_set, num_boost_round = 10000, 
                       nfold = n_folds, early_stopping_rounds = 100, 
                       metrics = 'auc', seed = 50)

这是 xgboost 交叉验证，它 return 评估历史。历史本质上是一个 pandas 数据框。数据框中的列名取决于正在传递的内容，如训练、测试和评估。

best_score = np.max(cv_results['auc-mean'])

在这里您正在寻找评价历史中最好的auc 被称为

'test-auc-mean' and 'train-auc-mean'

正如@avvinci 建议的那样。列名 'auc-mean' 不存在，因此抛出 KeyError。您可以将其称为训练集中最佳 auc 的 train-auc-mean 或测试集中最佳 auc 的 test-auc-mean。

如果您有疑问，只需运行在外部进行交叉验证并在 cv_results 上使用 head。

使用 Hyperopt 的 XGBoost。超参数调整时遇到的问题

XGBoost using Hyperopt. Facing issues while Hyper-Parameter Tuning

machine-learning

xgboost

data-science

hyperopt