Scikit Learn GridSearchCV 和管道使用不同的方法
ScikitLearn GridSearchCV and pipeline using different methods
我正在尝试使用 GridSearchCV 和管道将这些机器学习方法评估为相同的数据,当我在相同的方法中改变参数时它起作用,但是当我放置多个方法时它会出错
pipe_steps = [
('scaler', StandardScaler()),
('logistic', LogisticRegression()),
('SVM',SVC()),
('KNN',KNeighborsClassifier())]
check_params={
'logistic__C':[1,1e5],
'SVM__C':[1,1e5],
'KNN__n_neighbors':[3,5],
'KNN__metric':['euclidean','manhattan']
}
pipeline = Pipeline(pipe_steps)
GridS = GridSearchCV(pipeline, param_grid=check_params)
GridS.fit(X, y)
print('Score %3.2f' %GridS.score(X, y))
print('Best Fit')
print(GridS.best_params_)
在下面的管道线上给出了错误信息
TypeError Traceback (most recent call last)
<ipython-input-139-75960299bc1c> in <module>
13 }
14
---> 15 pipeline = Pipeline(pipe_steps)
16
17 BCX_Grid = GridSearchCV(pipeline, param_grid=check_params)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose)
133 def __init__(self, steps, memory=None, verbose=False):
134 self.steps = steps
--> 135 self._validate_steps()
136 self.memory = memory
137 self.verbose = verbose
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\pipeline.py in _validate_steps(self)
183 "transformers and implement fit and transform "
184 "or be the string 'passthrough' "
--> 185 "'%s' (type %s) doesn't" % (t, type(t)))
186
187 # We allow last estimator to be None as an identity transformation
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)' (type <class 'sklearn.linear_model.logistic.LogisticRegression'>) doesn't
谢谢
您的问题不在于超参数,因为它们定义正确。问题是所有中间步骤都应该是 transformers
,如错误所示。在您的管道中 SVM
不是变压器。
看到这个
您需要将管道拆分为多个管道,因为我有一个解决方案需要一个网格参数列表来确定管道的每个步骤。
pipeline = Pipeline([
('transformer', StandardScaler(),),
('model', 'passthrough',),
])
params = [
{
'model': (LogisticRegression(),),
'model__C': (1, 1e5,),
},
{
'model': (SVC(),),
'model__C': (1, 1e5,),
},
{
'model': (KNeighborsClassifier(),),
'model__n_neighbors': (3, 5,),
'model__metric': ('euclidean', 'manhattan',),
}
]
grid_Search = GridSearchCV(pipeline, params)
使用此策略,您可以动态定义管道的步骤。
我正在尝试使用 GridSearchCV 和管道将这些机器学习方法评估为相同的数据,当我在相同的方法中改变参数时它起作用,但是当我放置多个方法时它会出错
pipe_steps = [
('scaler', StandardScaler()),
('logistic', LogisticRegression()),
('SVM',SVC()),
('KNN',KNeighborsClassifier())]
check_params={
'logistic__C':[1,1e5],
'SVM__C':[1,1e5],
'KNN__n_neighbors':[3,5],
'KNN__metric':['euclidean','manhattan']
}
pipeline = Pipeline(pipe_steps)
GridS = GridSearchCV(pipeline, param_grid=check_params)
GridS.fit(X, y)
print('Score %3.2f' %GridS.score(X, y))
print('Best Fit')
print(GridS.best_params_)
在下面的管道线上给出了错误信息
TypeError Traceback (most recent call last)
<ipython-input-139-75960299bc1c> in <module>
13 }
14
---> 15 pipeline = Pipeline(pipe_steps)
16
17 BCX_Grid = GridSearchCV(pipeline, param_grid=check_params)
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\pipeline.py in __init__(self, steps, memory, verbose)
133 def __init__(self, steps, memory=None, verbose=False):
134 self.steps = steps
--> 135 self._validate_steps()
136 self.memory = memory
137 self.verbose = verbose
C:\ProgramData\Anaconda3\lib\site-packages\sklearn\pipeline.py in _validate_steps(self)
183 "transformers and implement fit and transform "
184 "or be the string 'passthrough' "
--> 185 "'%s' (type %s) doesn't" % (t, type(t)))
186
187 # We allow last estimator to be None as an identity transformation
TypeError: All intermediate steps should be transformers and implement fit and transform or be the string 'passthrough' 'LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=None, solver='warn', tol=0.0001, verbose=0,
warm_start=False)' (type <class 'sklearn.linear_model.logistic.LogisticRegression'>) doesn't
谢谢
您的问题不在于超参数,因为它们定义正确。问题是所有中间步骤都应该是 transformers
,如错误所示。在您的管道中 SVM
不是变压器。
看到这个
您需要将管道拆分为多个管道,因为我有一个解决方案需要一个网格参数列表来确定管道的每个步骤。
pipeline = Pipeline([
('transformer', StandardScaler(),),
('model', 'passthrough',),
])
params = [
{
'model': (LogisticRegression(),),
'model__C': (1, 1e5,),
},
{
'model': (SVC(),),
'model__C': (1, 1e5,),
},
{
'model': (KNeighborsClassifier(),),
'model__n_neighbors': (3, 5,),
'model__metric': ('euclidean', 'manhattan',),
}
]
grid_Search = GridSearchCV(pipeline, params)
使用此策略,您可以动态定义管道的步骤。