无法调整 CatBoostRegressor 的超参数

Question

我正在尝试使 CatBoostRegressor 适合我的模型。当我对基线模型执行 K 折 CV 时，一切正常。但是当我使用 Optuna 进行超参数调整时，它做了一些非常奇怪的事情。它运行第一次试验，然后抛出以下错误：-

[I 2021-08-26 08:00:56,865] Trial 0 finished with value: 0.7219653113910736 and parameters: 
{'model__depth': 2, 'model__iterations': 1715, 'model__subsample': 0.5627211605250965, 
'model__learning_rate': 0.15601805222619286}. Best is trial 0 with value: 0.7219653113910736. 
[W 2021-08-26 08:00:56,869] 

Trial 1 failed because of the following error: CatBoostError("You 
can't change params of fitted model.")
Traceback (most recent call last):

我对 XGBRegressor 和 LGBM 使用了类似的方法，它们运行良好。那么，为什么我会收到 CatBoost 错误？

下面是我的代码：-

cat_cols = [cname for cname in train_data1.columns if 
train_data1[cname].dtype == 'object']
num_cols = [cname for cname in train_data1.columns if 
train_data1[cname].dtype in ['int64', 'float64']]


from sklearn.preprocessing import StandardScaler
num_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 
                             'mean')),('scale', StandardScaler())])
cat_trans = Pipeline(steps = [('impute', SimpleImputer(strategy = 
                             'most_frequent')), ('encode', 
                         OneHotEncoder(handle_unknown = 'ignore'))])

from sklearn.compose import ColumnTransformer

preproc = ColumnTransformer(transformers = [('cat', cat_trans, 
                           cat_cols), ('num', num_trans, num_cols)])


from catboost import CatBoostRegressor
cbr_model = CatBoostRegressor(random_state = 69, 
                             loss_function='RMSE', 
                             eval_metric='RMSE', 
                             leaf_estimation_method ='Newton', 
                             bootstrap_type='Bernoulli', task_type = 
                             'GPU')

pipe = Pipeline(steps = [('preproc', preproc), ('model', cbr_model)])


import optuna
from sklearn.metrics import mean_squared_error

def objective(trial):
    model__depth = trial.suggest_int('model__depth', 2, 10)
    model__iterations = trial.suggest_int('model__iterations', 100, 
                                          2000)
    model__subsample = trial.suggest_float('model__subsample', 0.0, 
                                           1.0)
    model__learning_rate =trial.suggest_float('model__learning_rate', 
                                              0.001, 0.3, log = True)

    params = {'model__depth' : model__depth,
              'model__iterations' : model__iterations,
              'model__subsample' : model__subsample, 
              'model__learning_rate' : model__learning_rate}

    pipe.set_params(**params)
    pipe.fit(train_x, train_y)
    pred = pipe.predict(test_x)

    return np.sqrt(mean_squared_error(test_y, pred))

cbr_study = optuna.create_study(direction = 'minimize')
cbr_study.optimize(objective, n_trials = 10)

Answer 1

显然，CatBoost 具有这种机制，您必须 为每次试验创建新的 CatBoost 模型对象。 我在 Github 上就此提出了一个问题，他们说了已实施 以保护长期训练的结果。 这对我来说毫无意义！

截至目前，此问题的唯一解决方法是您必须为每次试验创建新的 CatBoost 模型！

如果您使用 Pipeline 方法和 Optuna，另一种更明智的方法是在 optuna 函数中定义最终管道实例和模型实例。然后再次在函数外定义最终的管道实例。

这样，如果您使用 50 次试验，就不必定义 50 个实例！！

无法调整 CatBoostRegressor 的超参数

Unable to tune hyperparameters for CatBoostRegressor

python

machine-learning

hyperparameters

catboost