Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

这是我的代码。这是一个二元分类问题,评价标准是AUC分数。我在 Stack Overflow 上查看了一个解决方案并实施了它,但没有用,仍然给我一个错误。

param_grid =   {
    'n_estimators' : [1000, 10000],  
    'boosting_type': ['gbdt'],
    'num_leaves': [30, 35],
    #'learning_rate': [0.01, 0.02, 0.05],
    #'colsample_bytree': [0.8, 0.95 ],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    #'reg_alpha'  : [0.01, 0.02, 0.05],
    #'reg_lambda' : [0.01, 0.02, 0.05],
    'min_split_gain' :[0.01, 0.02, 0.05]
    }
    
lgb  =  LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric  = 'auc', verbose_eval=20)


grid_search = GridSearchCV(lgb, param_grid= param_grid,
                            scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)

grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))

best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)

错误发生在 best_model.fit(X_train, y_train)

回答

此错误是由于您在网格搜索期间使用了提前停止,但在对整个数据集拟合最佳模型时决定不使用提前停止。

您传递给 LGBMClassifier 的一些关键字参数被添加到训练生成的模型对象中的 params,包括 early_stopping_rounds.

要禁用提前停止,您可以使用update_params()

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)

更多详情

我做了一些假设将你的问题变成了 minimal reproducible example。以后,我建议您在此处提问时这样做。它将帮助您获得更好、更快的帮助。

我用 pip install lightgbm==3.1.0 安装了 lightgbm 3.1.0。我在 Mac.

上使用 Python 3.8.3

我对您的示例进行了更改,使其更易于使用

  • 删除了注释代码
  • 将迭代次数减少到 [10, 100],将 num_leaves 减少到 [8, 10],这样训练会 运行 快得多
  • 增加进口
  • 添加了一个特定的数据集和代码以重复生成它

可重现的例子

from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split

param_grid =   {
    'n_estimators' : [10, 100],
    'boosting_type': ['gbdt'],
    'num_leaves': [8, 10],
    'subsample': [0.8, 0.95],
    'is_unbalance': [True, False],
    'min_split_gain' :[0.01, 0.02, 0.05]
}

lgb = LGBMClassifier(
    random_state=42,
    early_stopping_rounds = 10,
    eval_metric  = 'auc',
    verbose_eval=20
)

grid_search = GridSearchCV(
    lgb,
    param_grid= param_grid,
    scoring='roc_auc',
    cv=5,
    n_jobs=-1,
    verbose=1
)

X, y = load_breast_cancer(return_X_y=True)


X_train, X_test, y_train, y_test = train_test_split(
    X,
    y,
    test_size=0.1,
    random_state=42
)
                                 
grid_search.fit(
    X_train,
    y_train,
    eval_set = (X_test, y_test)
)

best_model = grid_search.best_estimator_

# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)

# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#

best_model.fit(X_train, y_train)