针对多 class 目标变量的 XGBoost 超参数调整
Hyper-parameter Tuning for XGBoost for Multi-class Target Variable
我有一个多分类问题(必须预测 1,2 或 3),我正在尝试使用 XG-Boost 解决。我正在尝试使用随机搜索微调我的参数。这是我的代码:
我尝试将 'param_distributions' 中的 'scoring' 参数从 'auc_roc' 更改为 'precision'、'f1_samples'、'jaccard'(这引发了另一个错误与 'average' 参数相关,因为我有多类问题)。
loss=['hinge','log','modifier_huber','squared_hinge','perceptron']
penalty = ['li','l2','elasticnet']
alpha = [0.0001, 0.001,0.01,0.1,1,10,100,1000]
learnin_rate = ['constant','optimal','invscaling','adaptive']
class_weight = [{0.3,0.5,0.2},{0.3,0.4,0.3}]
eta0 = [1,10,100]
xg_class = xgb.XGBClassifier(objective = "multi:softmax", colsample_bytree = 1,
gamma = 1,subsample = 0.8, learning_rate = 0.01, max_depth = 3,
alpha = 10,n_estimators = 1000, multilabel_ =True, num_classes = 3)
from sklearn.metrics import jaccard_score
param_distributions = dict(loss = loss, penalty=penalty, alpha=alpha, learnin_rate=learnin_rate, class_weight=class_weight, eta0=eta0)
random = RandomizedSearchCV(estimator = xg_class, param_distributions=param_distributions,
scoring = jaccard_score(y_true=Y_miss_xgb_test, y_pred = preds_miss_xgb, average = 'micro'),
verbose = 1, n_jobs =-1, n_iter = 1000)
random_result = random.fit(X_miss_xgb_train, Y_miss_xgb_train)
我得到的错误是
ValueError: scoring should either be a single string or callable for
single metric evaluation or a list/tuple of strings or a dict of
scorer name mapped to the callable for multiple metric evaluation. Got
0.3996569468267582 of type
RandomizedSearchCV 需要单个字符串或可调用的单个指标评估或 list/tuple 字符串或记分员姓名的字典映射到可调用的多个指标评估作为 "scoring" 参数,但传递了一个浮点值。 jaccard_score(y_true=Y_miss_xgb_test, y_pred = preds_miss_xgb, average = 'micro')
returns 浮动分数(完全 0.3996569468267582
)。
您可以将 "jaccard_score" 得分指定为字符串,如下所示:
random = RandomizedSearchCV(estimator = xg_class,
param_distributions=param_distributions,
scoring = "jaccard_score",
verbose = 1,
n_jobs =-1,
n_iter = 1000)
我有一个多分类问题(必须预测 1,2 或 3),我正在尝试使用 XG-Boost 解决。我正在尝试使用随机搜索微调我的参数。这是我的代码:
我尝试将 'param_distributions' 中的 'scoring' 参数从 'auc_roc' 更改为 'precision'、'f1_samples'、'jaccard'(这引发了另一个错误与 'average' 参数相关,因为我有多类问题)。
loss=['hinge','log','modifier_huber','squared_hinge','perceptron']
penalty = ['li','l2','elasticnet']
alpha = [0.0001, 0.001,0.01,0.1,1,10,100,1000]
learnin_rate = ['constant','optimal','invscaling','adaptive']
class_weight = [{0.3,0.5,0.2},{0.3,0.4,0.3}]
eta0 = [1,10,100]
xg_class = xgb.XGBClassifier(objective = "multi:softmax", colsample_bytree = 1,
gamma = 1,subsample = 0.8, learning_rate = 0.01, max_depth = 3,
alpha = 10,n_estimators = 1000, multilabel_ =True, num_classes = 3)
from sklearn.metrics import jaccard_score
param_distributions = dict(loss = loss, penalty=penalty, alpha=alpha, learnin_rate=learnin_rate, class_weight=class_weight, eta0=eta0)
random = RandomizedSearchCV(estimator = xg_class, param_distributions=param_distributions,
scoring = jaccard_score(y_true=Y_miss_xgb_test, y_pred = preds_miss_xgb, average = 'micro'),
verbose = 1, n_jobs =-1, n_iter = 1000)
random_result = random.fit(X_miss_xgb_train, Y_miss_xgb_train)
我得到的错误是
ValueError: scoring should either be a single string or callable for single metric evaluation or a list/tuple of strings or a dict of scorer name mapped to the callable for multiple metric evaluation. Got 0.3996569468267582 of type
RandomizedSearchCV 需要单个字符串或可调用的单个指标评估或 list/tuple 字符串或记分员姓名的字典映射到可调用的多个指标评估作为 "scoring" 参数,但传递了一个浮点值。 jaccard_score(y_true=Y_miss_xgb_test, y_pred = preds_miss_xgb, average = 'micro')
returns 浮动分数(完全 0.3996569468267582
)。
您可以将 "jaccard_score" 得分指定为字符串,如下所示:
random = RandomizedSearchCV(estimator = xg_class,
param_distributions=param_distributions,
scoring = "jaccard_score",
verbose = 1,
n_jobs =-1,
n_iter = 1000)