在 GridSearchCV 中，如何只传递 param_grid 中的默认参数？

Question

我是初学者，下面有如下代码。

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
modelwithpca = GridSearchCV(pipeline, param_grid= ,cv=cv)
modelwithpca.fit(X_train,y_train)

这是一个本地测试，我想要完成的是，

我。对数据集执行 PCA

二。仅使用默认参数的高斯朴素贝叶斯

三。使用 StratifiedShuffleSplit

所以最后我希望将上述步骤转移到另一个转储分类器、数据集和特征列表的函数来测试性能。

dump_classifier_and_data(modelwithpca, dataset, features)

在 param_grid 部分，我不想测试任何参数列表。如果有意义的话，我只想在高斯朴素贝叶斯中使用默认参数。我要改变什么？

关于我如何实例化分类器对象是否也应该有任何改变？

Answer 1

GridSearchCV 的目的是使用不同的参数测试管道中的至少一件事（如果您不想测试不同的参数，则不需要使用 GridSearchCV). 所以，一般来说，如果你想测试不同的 PCA n_components。使用带有 GridSearchCV 的管道的格式如下：

gscv = GridSearchCV(pipeline, param_grid={'{step_name}__{parameter_name}': [possible values]}, cv=cv)

例如：

# this would perform cv for the 3 different values of n_components for pca
gscv = GridSearchCV(pipeline, param_grid={'pca__n_components': [3, 6, 10]}, cv=cv)

如果您使用 GridSearchCV 如上所述调整 PCA，这当然意味着您的模型将具有默认值。

如果您不需要参数调整，那么 GridSearchCV 不是可行的方法，因为像这样将模型的默认参数用于 GridSearchCV，只会生成具有一种组合的参数网格，因此就像只表演简历一样。 这样做是没有意义的 - 如果我正确理解了你的问题：

from sklearn.naive_bayes import GaussianNB
from sklearn.decomposition import PCA
from sklearn.pipeline import Pipeline

pca = PCA()
model = GaussianNB()
steps = [('pca', pca), ('model', model)]
pipeline = Pipeline(steps)

cv = StratifiedShuffleSplit(n_splits=5, test_size=0.2, random_state=42)
# get the default parameters of your model and use them as a param_grid
modelwithpca = GridSearchCV(pipeline, param_grid={'model__' + k: [v] for k, v in model.get_params().items()}, cv=cv)

# will run 5 times as your cv is configured
modelwithpca.fit(X_train,y_train)

希望对您有所帮助，祝您好运！

在 GridSearchCV 中，如何只传递 param_grid 中的默认参数？

In GridSearchCV, how do I pass only the default parameters in param_grid?

python

machine-learning

scikit-learn

grid-search

sklearn-pandas