尝试适应 GridSearchCV 时出错
Getting error while trying to fit to GridSearchCV
我正在尝试使用管道和 GridSearchCV 将岭回归模型拟合到我的数据中。
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
X = transformed_data.iloc[:, :-1]
y = transformed_data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
params = {}
params['ridge__alpha'] = np.arange(0, 100, 1).tolist()
t = [('labelenc',LabelEncoder() , [0]), ('stand', StandardScaler(), [1,2,3,4,5,6]), ('poly'),PolynomialFeatures(degree=2),[1,2,3,4,5,6] ]
transformer = ColumnTransformer(transformers=t)
pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])
#grid_ridge2_r2 = GridSearchCV(pipe, params, cv=10, scoring='r2', n_jobs=-1)
#results_ridge2_r2 = grid_ridge2_r2.fit(X_train,y_train)
grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)
我不断得到
ValueError: too many values to unpack (expected 3)
在最后一行 grid_ridge2_rmse.fit(X_train,y_train)
。我的直觉是我拆分数据集的方式有问题。
您的管道中存在一些错误。
首先 LabelEncoder
不能在 scikit-learn 管道中使用,因为它用于修改 y
而不是 X
。假设您要对特征的分类值进行编码,则应将其替换为 OrdinalEncoder
.
然后,要设置网格参数,它必须按照以下命名约定命名 <step>__<hyperparameter
。在你的情况下设置山脊参数应该是 m__alpha
.
管道参数可以使用pipe.get_params()
查看。
我会这样做:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, OrdinalEncoder, StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
params = {'m__alpha' : np.arange(0, 100, 1).tolist()}
t = [
('labelenc',OrdinalEncoder() , [0]),
('stand', StandardScaler(), [1,2,3,4,5,6]),
('poly', PolynomialFeatures(degree=2), [1,2,3,4,5,6])
]
transformer = ColumnTransformer(transformers=t)
pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])
grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)
我正在尝试使用管道和 GridSearchCV 将岭回归模型拟合到我的数据中。
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
X = transformed_data.iloc[:, :-1]
y = transformed_data['class']
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
params = {}
params['ridge__alpha'] = np.arange(0, 100, 1).tolist()
t = [('labelenc',LabelEncoder() , [0]), ('stand', StandardScaler(), [1,2,3,4,5,6]), ('poly'),PolynomialFeatures(degree=2),[1,2,3,4,5,6] ]
transformer = ColumnTransformer(transformers=t)
pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])
#grid_ridge2_r2 = GridSearchCV(pipe, params, cv=10, scoring='r2', n_jobs=-1)
#results_ridge2_r2 = grid_ridge2_r2.fit(X_train,y_train)
grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)
我不断得到
ValueError: too many values to unpack (expected 3)
在最后一行 grid_ridge2_rmse.fit(X_train,y_train)
。我的直觉是我拆分数据集的方式有问题。
您的管道中存在一些错误。
首先 LabelEncoder
不能在 scikit-learn 管道中使用,因为它用于修改 y
而不是 X
。假设您要对特征的分类值进行编码,则应将其替换为 OrdinalEncoder
.
然后,要设置网格参数,它必须按照以下命名约定命名 <step>__<hyperparameter
。在你的情况下设置山脊参数应该是 m__alpha
.
管道参数可以使用pipe.get_params()
查看。
我会这样做:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, OrdinalEncoder, StandardScaler
from sklearn.linear_model import Ridge
from sklearn.model_selection import GridSearchCV
from sklearn.pipeline import Pipeline
import numpy as np
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state = 1)
params = {'m__alpha' : np.arange(0, 100, 1).tolist()}
t = [
('labelenc',OrdinalEncoder() , [0]),
('stand', StandardScaler(), [1,2,3,4,5,6]),
('poly', PolynomialFeatures(degree=2), [1,2,3,4,5,6])
]
transformer = ColumnTransformer(transformers=t)
pipe = Pipeline(steps=[('t', transformer), ('m',Ridge())])
grid_ridge2_rmse = GridSearchCV(pipe, params, cv=10, scoring='neg_root_mean_squared_error', n_jobs=-1)
results_ridge2_rmse = grid_ridge2_rmse.fit(X_train,y_train)