scikit-learn 的 LogisticRegression() 能否自动将输入数据标准化为 z 分数?
Can scikit-learn's LogisticRegression() automatically normalize input data to z-scores?
有没有办法让 LogisticRegression()
的实例自动将为 fitting/training 提供的数据规范化为 z-scores
以构建模型? LinearRegression()
有一个 normalize=True
参数,但也许这对 LogisticRegression()
?
没有意义
如果是这样,我是否必须在调用 predict_proba()
之前手动标准化未标记的输入向量(即重新计算每列的平均值、标准差)?如果模型已经执行了可能代价高昂的计算,这将很奇怪。
谢谢
这是您要找的吗?
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=100, weights=[0.1, 0.9], random_state=0)
X.shape
# build pipe: first standardize by substracting mean and dividing std
# next do classificaiton
pipe = make_pipeline(StandardScaler(), LogisticRegression(class_weight='auto'))
# fit
pipe.fit(X, y)
# predict
pipe.predict_proba(X)
# to get back mean/std
scaler = pipe.steps[0][1]
scaler.mean_
Out[12]: array([ 0.0313, -0.0334, 0.0145, ..., -0.0247, 0.0191, 0.0439])
scaler.std_
Out[13]: array([ 1. , 1.0553, 0.9805, ..., 1.0033, 1.0097, 0.9884])
有没有办法让 LogisticRegression()
的实例自动将为 fitting/training 提供的数据规范化为 z-scores
以构建模型? LinearRegression()
有一个 normalize=True
参数,但也许这对 LogisticRegression()
?
如果是这样,我是否必须在调用 predict_proba()
之前手动标准化未标记的输入向量(即重新计算每列的平均值、标准差)?如果模型已经执行了可能代价高昂的计算,这将很奇怪。
谢谢
这是您要找的吗?
from sklearn.datasets import make_classification
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import make_pipeline
from sklearn.linear_model import LogisticRegression
X, y = make_classification(n_samples=1000, n_features=100, weights=[0.1, 0.9], random_state=0)
X.shape
# build pipe: first standardize by substracting mean and dividing std
# next do classificaiton
pipe = make_pipeline(StandardScaler(), LogisticRegression(class_weight='auto'))
# fit
pipe.fit(X, y)
# predict
pipe.predict_proba(X)
# to get back mean/std
scaler = pipe.steps[0][1]
scaler.mean_
Out[12]: array([ 0.0313, -0.0334, 0.0145, ..., -0.0247, 0.0191, 0.0439])
scaler.std_
Out[13]: array([ 1. , 1.0553, 0.9805, ..., 1.0033, 1.0097, 0.9884])