使用 Hyperopt 的 XGBoost。超参数调整时遇到的问题
XGBoost using Hyperopt. Facing issues while Hyper-Parameter Tuning
我正在尝试使用 Hyperopt 对 XGBoostClassifier 进行超参数调整。但我面临一个错误。请在下面找到我正在使用的代码以及错误:-
Step_1: Objective 函数
import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
"""Objective function for XGBoost Hyperparameter Optimization"""
# Keep track of evals
global ITERATION
ITERATION += 1
# # Retrieve the subsample if present otherwise set to 1.0
# subsample = params['boosting_type'].get('subsample', 1.0)
# # Extract the boosting type
# params['boosting_type'] = params['boosting_type']['boosting_type']
# params['subsample'] = subsample
# Make sure parameters that need to be integers are integers
for parameter_name in ['max_depth', 'colsample_bytree',
'min_child_weight']:
params[parameter_name] = int(params[parameter_name])
start = timer()
# Perform n_folds cross validation
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
run_time = timer() - start
# Extract the best score
best_score = np.max(cv_results['auc-mean'])
# Loss must be minimized
loss = 1 - best_score
# Boosting rounds that returned the highest cv score
n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
# Write to the csv file ('a' means append)
of_connection = open(out_file, 'a')
writer = csv.writer(of_connection)
writer.writerow([loss, params, ITERATION, n_estimators,
run_time])
# Dictionary with information for evaluation
return {'loss': loss, 'params': params, 'iteration': ITERATION,
'estimators': n_estimators, 'train_time': run_time,
'status': STATUS_OK}
我已经定义了示例 space 和优化算法。 运行 Hyperopt 时,我在下面遇到此错误。错误在 objective 函数中。
错误:按键错误:'auc-mean'
<ipython-input-62-8d4e97f16929> in objective(params, n_folds)
25 run_time = timer() - start
26 # Extract the best score
---> 27 best_score = np.max(cv_results['auc-mean'])
28 # Loss must be minimized
29 loss = 1 - best_score
首先,打印 cv_results 并查看存在哪个键。
在下面的示例笔记本中,键是:'test-auc-mean' 和 'train-auc-mean'
在此处查看单元格 5:
https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters
@avvinci 是正确的。让我再解释一下。
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
这是 xgboost 交叉验证,它 return 评估历史。历史本质上是一个 pandas 数据框。数据框中的列名取决于正在传递的内容,如训练、测试和评估。
best_score = np.max(cv_results['auc-mean'])
在这里您正在寻找评价历史中最好的auc 被称为
'test-auc-mean' and 'train-auc-mean'
正如@avvinci 建议的那样。列名 'auc-mean' 不存在,因此抛出 KeyError。您可以将其称为训练集中最佳 auc 的 train-auc-mean 或测试集中最佳 auc 的 test-auc-mean。
如果您有疑问,只需 运行 在外部进行交叉验证并在 cv_results 上使用 head。
我正在尝试使用 Hyperopt 对 XGBoostClassifier 进行超参数调整。但我面临一个错误。请在下面找到我正在使用的代码以及错误:-
Step_1: Objective 函数
import csv
from hyperopt import STATUS_OK
from timeit import default_timer as timer
MAX_EVALS = 200
N_FOLDS = 10
def objective(params, n_folds = N_FOLDS):
"""Objective function for XGBoost Hyperparameter Optimization"""
# Keep track of evals
global ITERATION
ITERATION += 1
# # Retrieve the subsample if present otherwise set to 1.0
# subsample = params['boosting_type'].get('subsample', 1.0)
# # Extract the boosting type
# params['boosting_type'] = params['boosting_type']['boosting_type']
# params['subsample'] = subsample
# Make sure parameters that need to be integers are integers
for parameter_name in ['max_depth', 'colsample_bytree',
'min_child_weight']:
params[parameter_name] = int(params[parameter_name])
start = timer()
# Perform n_folds cross validation
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
run_time = timer() - start
# Extract the best score
best_score = np.max(cv_results['auc-mean'])
# Loss must be minimized
loss = 1 - best_score
# Boosting rounds that returned the highest cv score
n_estimators = int(np.argmax(cv_results['auc-mean']) + 1)
# Write to the csv file ('a' means append)
of_connection = open(out_file, 'a')
writer = csv.writer(of_connection)
writer.writerow([loss, params, ITERATION, n_estimators,
run_time])
# Dictionary with information for evaluation
return {'loss': loss, 'params': params, 'iteration': ITERATION,
'estimators': n_estimators, 'train_time': run_time,
'status': STATUS_OK}
我已经定义了示例 space 和优化算法。 运行 Hyperopt 时,我在下面遇到此错误。错误在 objective 函数中。
错误:按键错误:'auc-mean'
<ipython-input-62-8d4e97f16929> in objective(params, n_folds)
25 run_time = timer() - start
26 # Extract the best score
---> 27 best_score = np.max(cv_results['auc-mean'])
28 # Loss must be minimized
29 loss = 1 - best_score
首先,打印 cv_results 并查看存在哪个键。
在下面的示例笔记本中,键是:'test-auc-mean' 和 'train-auc-mean'
在此处查看单元格 5: https://www.kaggle.com/tilii7/bayesian-optimization-of-xgboost-parameters
@avvinci 是正确的。让我再解释一下。
cv_results = xgb.cv(params, train_set, num_boost_round = 10000,
nfold = n_folds, early_stopping_rounds = 100,
metrics = 'auc', seed = 50)
这是 xgboost 交叉验证,它 return 评估历史。历史本质上是一个 pandas 数据框。数据框中的列名取决于正在传递的内容,如训练、测试和评估。
best_score = np.max(cv_results['auc-mean'])
在这里您正在寻找评价历史中最好的auc 被称为
'test-auc-mean' and 'train-auc-mean'
正如@avvinci 建议的那样。列名 'auc-mean' 不存在,因此抛出 KeyError。您可以将其称为训练集中最佳 auc 的 train-auc-mean 或测试集中最佳 auc 的 test-auc-mean。
如果您有疑问,只需 运行 在外部进行交叉验证并在 cv_results 上使用 head。