为什么 python 中我的数据集的规范化不会改变我的分类方法的结果？

Question

我是机器学习和数据分析的完全初学者，我在python中使用数据集（鸢尾花数据集）并进行了K-Neighbor分类方法，我确实得到了0.97的准确率%。我有一个练习要我解释如果我将输入数据标准化会发生什么。

我使用

对其进行了规范化

from sklearn.datasets import load_iris
from sklearn import preprocessing
# load the iris dataset
iris = load_iris()
print(iris.data.shape)
# separate the data from the target attributes
X = iris.data
y = iris.target
# normalize the data attributes
normalized_X = preprocessing.normalize(X)

然后我在我的 KN 方法代码中采用了这个 normalized_X 但我的准确性没有改变，这正常吗？

Answer 1

规范化步骤旨在重塑您的数据space，使数据在所有特征维度上的分布大致相同。这使得在某些情况下更容易和更快地找到一个好的解决方案，但并不总是保证比没有缩放的情况更好的解决方案。这当然是一个很好的做法，所以你应该在这个问题和其他问题中继续这样做。它还有助于基于梯度的优化器（例如随机梯度下降）收敛到一个好的解决方案，有时速度更快，但并不总是保证更好的性能。您可以在吴恩达的以下视频中找到有关此事的一些权威信息：

https://www.youtube.com/watch?v=gV5fD8Xbwgk

还有很多与此相关的其他资源，如果您只是搜索，请在 Google 上说“机器学习中特征缩放的目的”。

Answer 2

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

为什么 python 中我的数据集的规范化不会改变我的分类方法的结果？

why the normalization of my dataset in python doesn't change the results of my classification method?

python

knn