从 pickle 加载 ML 模型时如何转换新数据？

Question

我用 StandartScaler 腌制了 KNN 模型。

X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

当我尝试加载模型并通过 StandartScaler().transform() 传递新值时，它给我一个错误：

sklearn.exceptions.NotFittedError: This StandardScaler instance is not
fitted yet. Call 'fit' with appropriate arguments before using this estimator.

我正在尝试从字典中加载值

dic = {'a':1, 'b':32323, 'c':12}

sc = StandartScaler()

load = pickle.load(open('KNN.mod'), 'rb'))

load.predict(sc.transform([[dic['a'], dic['b'], dic['c']]]))

据我从错误中了解到，我必须将新数据拟合到 sc。但如果这样做，它会给我错误的预测。我不确定我是否过度拟合或 smth，随机森林和决策树在没有 sc 的情况下可以很好地处理该数据。逻辑回归半ok

Answer 1

您需要同时训练和 pickle 整个机器学习管道。这可以使用 sklearn 的 Pipeline 工具来完成。在您的情况下，它看起来像：

from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline
from sklearn.neighbors import NearestNeighbors

pipeline = Pipeline([('scaler', StandardScaler()), ('knn', NearestNeighbors())])
pipe.fit(X_train, y_train)
# save the ml pipeline
pickle.dump(pipeline, open('KNN_pipeline.pkl'), 'wb'))

# load the ml pipeline and do prediction
pipeline = pickle.load(open('KNN_pipeline.pkl'), 'rb'))
pipeline.predict(X_test)

从 pickle 加载 ML 模型时如何转换新数据？

How to transform new data when ML model is loaded from pickle?

machine-learning

pickle

scikit-learn