Sklearn pipeline transform specific columns - ValueError: too many values to unpack (expected 2)

Sklearn pipeline transform specific columns - ValueError: too many values to unpack (expected 2)

我正在尝试使用缩放器、onhotencoder、多项式特征和最后的线性回归模型制作管道

from sklearn.pipeline import Pipeline
pipeline = Pipeline([
                    ('scaler', StandardScaler(), num_cols),
                    ('polynom', PolynomialFeatures(3), num_cols), 
                    ('encoder', OneHotEncoder(), cat_cols),
                   ('linear_regression', LinearRegression() )
])

但是当我安装管道时出现 ValueError: 太多值无法解压(预期 2)

pipeline.fit(x_train,y_train)
pipeline.score(x_test, y_test)

如果我没理解错的话,您想将管道的某些步骤应用于特定的列。而不是通过在管道阶段的末尾添加列名(这是不正确的并导致错误),你必须使用 ColumnTransformer. Here 你可以找到另一个类似的例子。

对于你的情况,你可以这样做:

import pandas as pd

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import PolynomialFeatures
from sklearn.preprocessing import OneHotEncoder
from sklearn.linear_model import LinearRegression
from sklearn.compose import ColumnTransformer

# Fake data.
train_data = pd.DataFrame({'n1': range(10), 'n2': range(10)})
train_data['c1'] = 0
train_data['c1'][5:] = 1
y_train = [0]*10
y_train[5:] = [1]*5

# Here I assumed you are using a DataFrame. If not, use integer indices instead of column names.
num_cols = ['n1', 'n2']
cat_cols = ['c1']


# Pipeline to transform the numerical features.
numerical_transformer = Pipeline([('scaler', StandardScaler()),
                                  ('polynom', PolynomialFeatures(3))
    
])

# Apply the numerical transformer only on the numerical columns.
# Spearately, apply the OneHotEncoder.
ct = ColumnTransformer([('num_transformer', numerical_transformer, num_cols),
                        ('encoder', OneHotEncoder(), cat_cols)])

# Main pipeline for fitting.
pipeline = Pipeline([
                   ('column_transformer', ct),
                   ('linear_regression', LinearRegression() )
])

pipeline.fit(train_data, y_train)

从示意图上看,您的管道布局如下所示: