尽管我使用了 GridSearchCV，但我得到的分数低于我从未使用过任何参数的模型。可能是什么原因？

Question

在尝试进行模型调整时，它给我的分数比以前差。这是我的代码：

调整前

rf_model = RandomForestRegressor(random_state=42).fit(X_train, y_train)
y_pred = rf_model.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
rmse

它给出了：344.73852779396566

但是当我尝试使用 GridSearchCV,

rf_params = {"max_depth":[5,8,10],
             "max_features":[2,5,10],
             "n_estimators":[200,500,100,2000],
             "min_samples_split":[2,10,80]}

rf_cv = GridSearchCV(rf_model, rf_params, 
            cv = 10, verbose = 2, n_jobs=-1).fit(X_train, y_train)
rf_cv.best_params_

它给了我最好的参数，比如：

{'max_depth': 8,
 'max_features': 2,
 'min_samples_split': 2,
 'n_estimators': 200}

然后我用这些参数再次训练模型：

调谐

rf_tunned = RandomForestRegressor(max_depth=8,
                                  max_features = 2,
                                  min_samples_split = 2,
                                  n_estimators = 200).fit(X_train, y_train)

y_pred = rf_tunned.predict(X_test)

rmse = np.sqrt(mean_squared_error(y_test, y_pred))
rmse

它给了我 rmse：350.14634045283685

出现这种情况的原因是什么？难道我们不使用模型调整来获得更好的结果吗？

Answer 1

有两点需要牢记：

通过简单地运行宁 RandomForestRegressor(random_state=42)，您恢复到所有参数的默认值（random_state 除外），如 documentation 中所述.
网格搜索并不“神奇”或包罗万象；它只会测试您指定范围内的参数组合。

查看docs中参数的默认值，原来你的RandomForestRegressor(random_state=42)运行实际上等同于以下参数设置：

{'max_depth': None,  # full tree depth
 'max_features': 20, # all features (default)
 'min_samples_split': 2,
 'n_estimators': 100}

此组合不包含在您为网格搜索指定的参数范围内，因此从未尝试过。因此，它实际上比您的 specific 网格搜索给出更好的错误也就不足为奇了。

尽管我使用了 GridSearchCV，但我得到的分数低于我从未使用过任何参数的模型。可能是什么原因？

Although I used the GridSearchCV, I get a lower score than the model I have never used any params. What could be the reason?

python

machine-learning

random-forest

scikit-learn

grid-search