如何根据 python 中训练集的均值和标准差来缩放测试集?
How to scale test set based on the mean and std from train set in python?
我阅读了解释“Why feature scaling only to training set?”的答案
“
答案是“使用训练集均值和标准差对任何测试集进行标准化”
因此,我尝试修复我之前的错误操作。但是,我检查了 StandardScaler() 的 official document,它不支持使用给定的均值和标准进行缩放。像这样:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler(mean = train_x.mean(), var_x = train.std())
sc.fit(test_x)
# this code is incorrect, but what is the correct code?
所以,我的问题是如何根据 python 中训练集的均值和标准差来缩放测试集。
根据官方文档,
with_mean bool, default=True If True, center the data before scaling.
This does not work (and will raise an exception) when attempted on
sparse matrices, because centering them entails building a dense
matrix which in common use cases is likely to be too large to fit in
memory.
with_std bool, default=True If True, scale the data to unit variance
(or equivalently, unit standard deviation).
所以你可以简单地这样做。
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(test_x)
StandardScaler() 仅将 with_mean 和 with_std 作为布尔值,这意味着这些值要么是 True 要么是 False。
我阅读了解释“Why feature scaling only to training set?”的答案 “ 答案是“使用训练集均值和标准差对任何测试集进行标准化”
因此,我尝试修复我之前的错误操作。但是,我检查了 StandardScaler() 的 official document,它不支持使用给定的均值和标准进行缩放。像这样:
from sklearn.preprocessing import StandardScaler
sc = StandardScaler(mean = train_x.mean(), var_x = train.std())
sc.fit(test_x)
# this code is incorrect, but what is the correct code?
所以,我的问题是如何根据 python 中训练集的均值和标准差来缩放测试集。
根据官方文档,
with_mean bool, default=True If True, center the data before scaling. This does not work (and will raise an exception) when attempted on sparse matrices, because centering them entails building a dense matrix which in common use cases is likely to be too large to fit in memory.
with_std bool, default=True If True, scale the data to unit variance (or equivalently, unit standard deviation).
所以你可以简单地这样做。
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
sc.fit(test_x)
StandardScaler() 仅将 with_mean 和 with_std 作为布尔值,这意味着这些值要么是 True 要么是 False。