如何在数据标准化后使用 K 最近邻 (KNN) 模型进行预测 (Python)

Question

我在 Python（模块 = Scikitlearn）中创建了一个 KNN 模型，使用三个变量（年龄、距离、旅行津贴）作为我的预测变量，其中使用它们来预测目标变量的结果的目的 (Method of Travel)。

构建模型时，我必须对三个预测变量（年龄、距离、旅行津贴）的数据进行归一化。与不规范化数据相比，这提高了我的模型的准确性。

既然我已经构建了模型，我想做一个预测。但是我将如何输入预测变量来进行预测，因为模型已经在标准化数据上进行了训练。

我想输入KNN.predict([[30,2000,40]])进行预测，其中Age = 30；距离 = 2000；津贴 = 40。但由于数据已标准化，我想不出如何做到这一点的方法。我使用以下代码对数据进行规范化：
X = preprocessing.StandardScaler().fit(X).transform(X.astype(float))

Answer 1

实际上，答案就隐藏在您提供的代码中！

一旦你适合 preprocessing.StandardScaler() 的实例，它就会记住如何缩放数据。试试这个

scaler = preprocessing.StandardScaler().fit(X)
# scaler is an object that knows how to normalize data points
X_normalized = scaler.transform(X.astype(float))
# used scalar to normalize the data points in X
# Note, this is what you have done, just in two steps. 
# I just capture the scaler object 
#
# ... Train your model on X_normalized
#
# Now predict
other_data = [[30,2000,40]]
other_data_normalized = scaler.transform(other_data)
KNN.predict(other_data_normalized)

请注意，我以同样的方式使用了两次 scaler.transform

见StandardScaler.transform

如何在数据标准化后使用 K 最近邻 (KNN) 模型进行预测 (Python)

How to make predictions using K-Nearest Neighbors (KNN) model when data has been normalized (Python)

python

classification

nearest-neighbor

scikit-learn

data-science