如何用一定的均值和标准差值对数据进行标准化

Question

如何使用特定的均值和标准差值对数据集进行标准化？

我知道有像 sklearn.preprocessing.StandardScaler 这样的包，但这个包只允许我们使用数据集自己的均值和标准差值来标准化数据集。如果我想使用我自己指定的均值和标准差值来标准化数据集怎么办？

Python 中有我可以使用的软件包吗？否则，我能想到的一种方法是为每个特征手动执行此操作（即 (X-mean)/(stddev) 数据集中的每个特征，其中 mean=我自己指定的平均值，stddev=我自己指定的标准偏差值）。

提前致谢。

Answer 1

存在 sklearn.preprocessing.StandardScaler 的原因是为了按需数据机器学习等。它用于管道。它本身确实有效，但那是在 tac 上使用污泥锤。您描述的方式是根据您认为适合自己的参数重新缩放数据的唯一方法。我唯一的建议是使用数组；因为数组会自动将它们的操作投射到它们的所有条目，所以代码看起来更好。

import numpy

data = numpy.array([1,2,3,34,2,2,3,43,4,3,2,3,4,4,5,56,6,43,32,2,2])

#Custom mean and std.
new_data = (data-10)/5

#Using the array's mean and std. 
new_data = (data-data.mean())/data.std()

如何用一定的均值和标准差值对数据进行标准化

How to standardize data with a certain mean and standard deviation value

python

preprocessor

standardized

scikit-learn