如何在 GridSearchCV（随机森林分类器 Scikit）上获得最佳估计器

Question

我是运行 GridSearch CV，用于优化 scikit 中分类器的参数。完成后，我想知道哪些参数被选为最佳。

每当我这样做时，我都会得到一个 AttributeError: 'RandomForestClassifier' object has no attribute 'best_estimator_'，并且不知道为什么，因为它似乎是 documentation 上的一个合法属性。

from sklearn.grid_search import GridSearchCV

X = data[usable_columns]
y = data[target]

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)

rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = {
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)

print '\n',CV_rfc.best_estimator_

产量：

`AttributeError: 'GridSearchCV' object has no attribute 'best_estimator_'

Answer 1

你必须先拟合你的数据才能得到最好的参数组合。

from sklearn.grid_search import GridSearchCV
from sklearn.datasets import make_classification
from sklearn.ensemble import RandomForestClassifier
# Build a classification task using 3 informative features
X, y = make_classification(n_samples=1000,
                           n_features=10,
                           n_informative=3,
                           n_redundant=0,
                           n_repeated=0,
                           n_classes=2,
                           random_state=0,
                           shuffle=False)


rfc = RandomForestClassifier(n_jobs=-1,max_features= 'sqrt' ,n_estimators=50, oob_score = True) 

param_grid = { 
    'n_estimators': [200, 700],
    'max_features': ['auto', 'sqrt', 'log2']
}

CV_rfc = GridSearchCV(estimator=rfc, param_grid=param_grid, cv= 5)
CV_rfc.fit(X, y)
print CV_rfc.best_params_

Answer 2

再补充一点以保持清楚。

文件内容如下：

best_estimator_ : estimator or dict:

Estimator that was chosen by the search, i.e. estimator which gave highest score (or smallest loss if specified) on the left out data.

当使用各种参数调用网格搜索时，它会根据给定的评分器函数选择得分最高的那个。 Best estimator 给出了导致最高分的参数的信息。

所以这个只能在拟合数据后调用

如何在 GridSearchCV（随机森林分类器 Scikit）上获得最佳估计器

How to get Best Estimator on GridSearchCV (Random Forest Classifier Scikit)

python

scikit-learn

random-forest

cross-validation