Sklearn pipeline transform specific columns - ValueError: too many values to unpack (expected 2)
Sklearn pipeline transform specific columns - ValueError: too many values to unpack (expected 2)
我正在尝试使用缩放器、onhotencoder、多项式特征和最后的线性回归模型制作管道
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler(), num_cols),
('polynom', PolynomialFeatures(3), num_cols),
('encoder', OneHotEncoder(), cat_cols),
('linear_regression', LinearRegression() )
])
但是当我安装管道时出现 ValueError: 太多值无法解压(预期 2)
pipeline.fit(x_train,y_train)
pipeline.score(x_test, y_test)
如果我没理解错的话,您想将管道的某些步骤应用于特定的列。而不是通过在管道阶段的末尾添加列名(这是不正确的并导致错误),你必须使用 ColumnTransformer
. Here 你可以找到另一个类似的例子。
对于你的情况,你可以这样做:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
# Fake data.
train_data = pd.DataFrame({'n1': range(10), 'n2': range(10)})
train_data['c1'] = 0
train_data['c1'][5:] = 1
y_train = [0]*10
y_train[5:] = [1]*5
# Here I assumed you are using a DataFrame. If not, use integer indices instead of column names.
num_cols = ['n1', 'n2']
cat_cols = ['c1']
# Pipeline to transform the numerical features.
numerical_transformer = Pipeline([('scaler', StandardScaler()),
('polynom', PolynomialFeatures(3))
])
# Apply the numerical transformer only on the numerical columns.
# Spearately, apply the OneHotEncoder.
ct = ColumnTransformer([('num_transformer', numerical_transformer, num_cols),
('encoder', OneHotEncoder(), cat_cols)])
# Main pipeline for fitting.
pipeline = Pipeline([
('column_transformer', ct),
('linear_regression', LinearRegression() )
])
pipeline.fit(train_data, y_train)
从示意图上看,您的管道布局如下所示:
我正在尝试使用缩放器、onhotencoder、多项式特征和最后的线性回归模型制作管道
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
('scaler', StandardScaler(), num_cols),
('polynom', PolynomialFeatures(3), num_cols),
('encoder', OneHotEncoder(), cat_cols),
('linear_regression', LinearRegression() )
])
但是当我安装管道时出现 ValueError: 太多值无法解压(预期 2)
pipeline.fit(x_train,y_train)
pipeline.score(x_test, y_test)
如果我没理解错的话,您想将管道的某些步骤应用于特定的列。而不是通过在管道阶段的末尾添加列名(这是不正确的并导致错误),你必须使用 ColumnTransformer
. Here 你可以找到另一个类似的例子。
对于你的情况,你可以这样做:
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer
# Fake data.
train_data = pd.DataFrame({'n1': range(10), 'n2': range(10)})
train_data['c1'] = 0
train_data['c1'][5:] = 1
y_train = [0]*10
y_train[5:] = [1]*5
# Here I assumed you are using a DataFrame. If not, use integer indices instead of column names.
num_cols = ['n1', 'n2']
cat_cols = ['c1']
# Pipeline to transform the numerical features.
numerical_transformer = Pipeline([('scaler', StandardScaler()),
('polynom', PolynomialFeatures(3))
])
# Apply the numerical transformer only on the numerical columns.
# Spearately, apply the OneHotEncoder.
ct = ColumnTransformer([('num_transformer', numerical_transformer, num_cols),
('encoder', OneHotEncoder(), cat_cols)])
# Main pipeline for fitting.
pipeline = Pipeline([
('column_transformer', ct),
('linear_regression', LinearRegression() )
])
pipeline.fit(train_data, y_train)
从示意图上看,您的管道布局如下所示: