`sklearn` 在有 eval 数据集时询问

`sklearn` asking for eval dataset when there is one

我正在研究 sklearn 的 Stacking Regressor,我使用 lightgbm 来训练我的模型。我的 lightgbm 模型有一个提前停止选项,我为此使用了评估数据集和指标。

当它输入 StackingRegressor 时,我看到了这个错误

ValueError: For early stopping, at least one dataset and eval metric is required for evaluation

这令人沮丧,因为我的代码中确实有它们。我想知道发生了什么事?这是我的代码。

import numpy as np 
import pandas as pd 

import lightgbm as lgb
from sklearn.ensemble import AdaBoostRegressor
from sklearn.tree import DecisionTreeRegressor
import xgboost as xgb
from sklearn.ensemble import StackingRegressor

opt_parameters_LGBM = {'bagging_fraction': 0.37031434827212084, 'bagging_seed': 47, 'boosting_type': 'gbdt', 
                       'feature_fraction': 0.3894822966866982, 'learning_rate': 0.01, 'max_bin': 177, 'max_depth': -1, 
                       'metric': 'rmse', 'min_child_weight': 1000.0, 'num_leaves': 161, 'objective': 'regression', 
                       'random_state': 47, 'reg_alpha': 10, 'reg_lambda': 50, 'verbosity': -1}  
m1 = lgb.LGBMRegressor(valid_sets = [lgb_train, lgb_eval], verbose_eval = 30, num_boost_round = 10000, early_stopping_rounds = 10, n_jobs=4, n_estimators=3000, **opt_parameters_LGBM)
m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse')

opt_parameters_ADA = {'learning_rate': 0.03, 'n_estimators': 5} 
m2 = AdaBoostRegressor(base_estimator=DecisionTreeRegressor(max_depth=3, min_samples_leaf=1, min_impurity_decrease=10, random_state=47), random_state=47, **opt_parameters_ADA)
m2.fit(X_train_df, y_train_df)

'''
Where problem starts
'''

gbm = xgb.XGBRegressor(
 learning_rate = 0.02,
 n_estimators= 5,
 max_depth= 4,
 min_child_weight= 2,
 gamma=0.9,                        
 subsample=0.8,
 colsample_bytree=0.8,
 objective= 'reg:squaredlogerror',
 nthread= -1,
 verbosity=3,
 random_state=20)

estimators = [('lgbm', m1), ('ada', m2)]

gbm = StackingRegressor(estimators=estimators, final_estimator=gbm, cv=5, verbose=1)
gbm.fit(X_train_df, y_train_df)

我猜这个问题是由 early_stoppingLGBMRegressor 中使用的事实引起的,因此它也需要 StackingRegressor() 中的 eval 数据。

尝试执行以下操作:

就在您为 LGBMRegressor() 模型安装了以下行 - m1.fit(X_train_df, y_train_df, eval_set = (X_val_df, y_val_df), eval_metric = 'rmse') 之后,添加这些行。

params = m1.get_params()

# remove early_stopping_rounds as your model is already fitted the data
params["early_stopping_rounds"] = None
m1.set_params(**params)

查看错误是否消失。