在 Python 中调整逻辑回归的多项式特征
Tuning the Polynomial Feature for Logistic Regression in Python
如果我想将二次多项式合并到我的逻辑模型(它有两个预测变量)中,例如我在下面尝试过的:
df_poly = df[['Y','x0','x1']].copy()
X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1),
df_poly['Y'], test_size=0.20,
random_state=10)
poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
pipe.fit(X_train, Y_train)
我会得到 x0、x1、x0^2、x1^2、x0*x1 的系数。
相反,我想调整这个过程,以便我只适合 x0、x1、x0^2 和 x0*x1。也就是说,我想消除 x1^2 项的可能性。有没有办法通过 sklearn 库做到这一点?
我会结合使用 ColumnTransformer
、PolynomialFeatures
和 FunctionTransformer
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer
X = pd.DataFrame({'X0': np.arange(10), 'X1': np.arange(10, 20)})
ct = ColumnTransformer([
('poly_X0X1', PolynomialFeatures(degree = 2, interaction_only=True, include_bias=False), ['X0', 'X1']),
('poly_x0', FunctionTransformer(func=lambda x: x**2), ['X0']),
]
)
poly = ct.fit_transform(X)
poly # X0, X1, X0*X1, X0^2
array([[ 0., 10., 0., 0.],
[ 1., 11., 11., 1.],
[ 2., 12., 24., 4.],
[ 3., 13., 39., 9.],
[ 4., 14., 56., 16.],
[ 5., 15., 75., 25.],
[ 6., 16., 96., 36.],
[ 7., 17., 119., 49.],
[ 8., 18., 144., 64.],
[ 9., 19., 171., 81.]])
如果我想将二次多项式合并到我的逻辑模型(它有两个预测变量)中,例如我在下面尝试过的:
df_poly = df[['Y','x0','x1']].copy()
X_train, X_test, Y_train, Y_test = train_test_split(df_poly.drop('Y',axis=1),
df_poly['Y'], test_size=0.20,
random_state=10)
poly = PolynomialFeatures(degree = 2, interaction_only=False, include_bias=False)
lr = LogisticRegression()
pipe = Pipeline([('polynomial_features',poly), ('logistic_regression',lr)])
pipe.fit(X_train, Y_train)
我会得到 x0、x1、x0^2、x1^2、x0*x1 的系数。
相反,我想调整这个过程,以便我只适合 x0、x1、x0^2 和 x0*x1。也就是说,我想消除 x1^2 项的可能性。有没有办法通过 sklearn 库做到这一点?
我会结合使用 ColumnTransformer
、PolynomialFeatures
和 FunctionTransformer
import numpy as np
import pandas as pd
from sklearn.pipeline import Pipeline
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import PolynomialFeatures, FunctionTransformer
X = pd.DataFrame({'X0': np.arange(10), 'X1': np.arange(10, 20)})
ct = ColumnTransformer([
('poly_X0X1', PolynomialFeatures(degree = 2, interaction_only=True, include_bias=False), ['X0', 'X1']),
('poly_x0', FunctionTransformer(func=lambda x: x**2), ['X0']),
]
)
poly = ct.fit_transform(X)
poly # X0, X1, X0*X1, X0^2
array([[ 0., 10., 0., 0.],
[ 1., 11., 11., 1.],
[ 2., 12., 24., 4.],
[ 3., 13., 39., 9.],
[ 4., 14., 56., 16.],
[ 5., 15., 75., 25.],
[ 6., 16., 96., 36.],
[ 7., 17., 119., 49.],
[ 8., 18., 144., 64.],
[ 9., 19., 171., 81.]])