Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
Light GBM Value Error: ValueError: For early stopping, at least one dataset and eval metric is required for evaluation
这是我的代码。这是一个二元分类问题,评价标准是AUC分数。我在 Stack Overflow 上查看了一个解决方案并实施了它,但没有用,仍然给我一个错误。
param_grid = {
'n_estimators' : [1000, 10000],
'boosting_type': ['gbdt'],
'num_leaves': [30, 35],
#'learning_rate': [0.01, 0.02, 0.05],
#'colsample_bytree': [0.8, 0.95 ],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
#'reg_alpha' : [0.01, 0.02, 0.05],
#'reg_lambda' : [0.01, 0.02, 0.05],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric = 'auc', verbose_eval=20)
grid_search = GridSearchCV(lgb, param_grid= param_grid,
scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))
best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)
错误发生在 best_model.fit(X_train, y_train)
回答
此错误是由于您在网格搜索期间使用了提前停止,但在对整个数据集拟合最佳模型时决定不使用提前停止。
您传递给 LGBMClassifier
的一些关键字参数被添加到训练生成的模型对象中的 params
,包括 early_stopping_rounds
.
要禁用提前停止,您可以使用update_params()
。
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
更多详情
我做了一些假设将你的问题变成了 minimal reproducible example。以后,我建议您在此处提问时这样做。它将帮助您获得更好、更快的帮助。
我用 pip install lightgbm==3.1.0
安装了 lightgbm
3.1.0。我在 Mac.
上使用 Python 3.8.3
我对您的示例进行了更改,使其更易于使用
- 删除了注释代码
- 将迭代次数减少到
[10, 100]
,将 num_leaves
减少到 [8, 10]
,这样训练会 运行 快得多
- 增加进口
- 添加了一个特定的数据集和代码以重复生成它
可重现的例子
from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
param_grid = {
'n_estimators' : [10, 100],
'boosting_type': ['gbdt'],
'num_leaves': [8, 10],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(
random_state=42,
early_stopping_rounds = 10,
eval_metric = 'auc',
verbose_eval=20
)
grid_search = GridSearchCV(
lgb,
param_grid= param_grid,
scoring='roc_auc',
cv=5,
n_jobs=-1,
verbose=1
)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.1,
random_state=42
)
grid_search.fit(
X_train,
y_train,
eval_set = (X_test, y_test)
)
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
这是我的代码。这是一个二元分类问题,评价标准是AUC分数。我在 Stack Overflow 上查看了一个解决方案并实施了它,但没有用,仍然给我一个错误。
param_grid = {
'n_estimators' : [1000, 10000],
'boosting_type': ['gbdt'],
'num_leaves': [30, 35],
#'learning_rate': [0.01, 0.02, 0.05],
#'colsample_bytree': [0.8, 0.95 ],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
#'reg_alpha' : [0.01, 0.02, 0.05],
#'reg_lambda' : [0.01, 0.02, 0.05],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(random_state=42, early_stopping_rounds = 10, eval_metric = 'auc', verbose_eval=20)
grid_search = GridSearchCV(lgb, param_grid= param_grid,
scoring='roc_auc', cv=5, n_jobs=-1, verbose=1)
grid_search.fit(X_train, y_train, eval_set = (X_val, y_val))
best_model = grid_search.best_estimator_
start = time()
best_model.fit(X_train, y_train)
Train_time = round(time() - start, 4)
错误发生在 best_model.fit(X_train, y_train)
回答
此错误是由于您在网格搜索期间使用了提前停止,但在对整个数据集拟合最佳模型时决定不使用提前停止。
您传递给 LGBMClassifier
的一些关键字参数被添加到训练生成的模型对象中的 params
,包括 early_stopping_rounds
.
要禁用提前停止,您可以使用update_params()
。
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)
更多详情
我做了一些假设将你的问题变成了 minimal reproducible example。以后,我建议您在此处提问时这样做。它将帮助您获得更好、更快的帮助。
我用 pip install lightgbm==3.1.0
安装了 lightgbm
3.1.0。我在 Mac.
我对您的示例进行了更改,使其更易于使用
- 删除了注释代码
- 将迭代次数减少到
[10, 100]
,将num_leaves
减少到[8, 10]
,这样训练会 运行 快得多 - 增加进口
- 添加了一个特定的数据集和代码以重复生成它
可重现的例子
from lightgbm import LGBMClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import GridSearchCV, train_test_split
param_grid = {
'n_estimators' : [10, 100],
'boosting_type': ['gbdt'],
'num_leaves': [8, 10],
'subsample': [0.8, 0.95],
'is_unbalance': [True, False],
'min_split_gain' :[0.01, 0.02, 0.05]
}
lgb = LGBMClassifier(
random_state=42,
early_stopping_rounds = 10,
eval_metric = 'auc',
verbose_eval=20
)
grid_search = GridSearchCV(
lgb,
param_grid= param_grid,
scoring='roc_auc',
cv=5,
n_jobs=-1,
verbose=1
)
X, y = load_breast_cancer(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
X,
y,
test_size=0.1,
random_state=42
)
grid_search.fit(
X_train,
y_train,
eval_set = (X_test, y_test)
)
best_model = grid_search.best_estimator_
# ---------------- my added code -----------------------#
# inspect current parameters
params = best_model.get_params()
print(params)
# remove early_stopping_rounds
params["early_stopping_rounds"] = None
best_model.set_params(**params)
# ------------------------------------------------------#
best_model.fit(X_train, y_train)