在 Python 中调整逻辑回归的多项式特征

Question

如果我想将二次多项式合并到我的逻辑模型（它有两个预测变量）中，例如我在下面尝试过的：

df_poly = df[['Y','x0','x1']].copy()
X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1), 
                                                    df_poly['Y'], test_size=0.20, 
                                                    random_state=10)

poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
pipe.fit(X_train, Y_train)

我会得到 x0、x1、x0^2、x1^2、x0*x1 的系数。

相反，我想调整这个过程，以便我只适合 x0、x1、x0^2 和 x0*x1。也就是说，我想消除 x1^2 项的可能性。有没有办法通过 sklearn 库做到这一点？

Answer 1

我会结合使用 ColumnTransformer、PolynomialFeatures 和 FunctionTransformer

import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer

X = pd.DataFrame({'X0': np.arange(10), 'X1': np.arange(10, 20)})
ct = ColumnTransformer([
    ('poly_X0X1', PolynomialFeatures(degree = 2, interaction_only=True, include_bias=False), ['X0', 'X1']),
    ('poly_x0', FunctionTransformer(func=lambda x: x**2), ['X0']),
]
)
poly = ct.fit_transform(X)
poly # X0, X1, X0*X1, X0^2

array([[  0.,  10.,   0.,   0.],
       [  1.,  11.,  11.,   1.],
       [  2.,  12.,  24.,   4.],
       [  3.,  13.,  39.,   9.],
       [  4.,  14.,  56.,  16.],
       [  5.,  15.,  75.,  25.],
       [  6.,  16.,  96.,  36.],
       [  7.,  17., 119.,  49.],
       [  8.,  18., 144.,  64.],
       [  9.,  19., 171.,  81.]])

在 Python 中调整逻辑回归的多项式特征

Tuning the Polynomial Feature for Logistic Regression in Python

python

machine-learning

scikit-learn

logistic-regression

polynomials