如何根据 python 中训练集的均值和标准差来缩放测试集？

Question

我阅读了解释“Why feature scaling only to training set?”的答案 “ 答案是“使用训练集均值和标准差对任何测试集进行标准化”

因此，我尝试修复我之前的错误操作。但是，我检查了 StandardScaler() 的 official document，它不支持使用给定的均值和标准进行缩放。像这样：

from sklearn.preprocessing import StandardScaler
sc = StandardScaler(mean = train_x.mean(), var_x = train.std())
sc.fit(test_x)

# this code is incorrect, but what is the correct code?

所以，我的问题是如何根据 python 中训练集的均值和标准差来缩放测试集。

Answer 1

根据官方文档，

with_mean bool, default=True If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.

with_std bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).

所以你可以简单地这样做。

from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(test_x)

StandardScaler() 仅将 with_mean 和 with_std 作为布尔值，这意味着这些值要么是 True 要么是 False。

如何根据 python 中训练集的均值和标准差来缩放测试集？

How to scale test set based on the mean and std from train set in python?

python

normalization

standardized

scale