为什么standardscaler和normalizer需要不同的数据输入?
Why do standardscaler and normalizer need different data input?
我尝试了以下代码,发现 StandardScaler(or MinMaxScaler)
和 sklearn
中的 Normalizer
处理数据的方式非常不同。这个问题使得管道建设更加困难。我想知道这种设计差异是否是故意的。
from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler
对于Normalizer
,读取数据"horizontally"。
Normalizer(norm = 'max').fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0.1 , 0.1 , 0.2 , 1. ],
# [ 0.02 , 0. , 0. , 1. ],
# [ 0. , -0.001, -0.001, 1. ]])
对于StandardScaler
和MinMaxScaler
,读取数据"vertically"。
StandardScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0. , 1.22474487, 1.33630621, -0.80538727],
# [ 1.22474487, 0. , -0.26726124, -0.60404045],
# [-1.22474487, -1.22474487, -1.06904497, 1.40942772]])
MinMaxScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[0.5 , 1. , 1. , 0. ],
# [1. , 0.5 , 0.33333333, 0.09090909],
# [0. , 0. , 0. , 1. ]])
这是预期的行为,因为 StandardScaler
和 Normalizer
有不同的用途。 StandardScaler
有效 'vertically',因为它...
Standardize[s] features by removing the mean and scaling to unit variance
[...]
Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
而 Normalizer
有效 'horizontally',因为它...
Normalize[s] samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
请查看 scikit-learn 文档(上面的链接),以获得更深入的了解,这会更好地满足您的目的。
我尝试了以下代码,发现 StandardScaler(or MinMaxScaler)
和 sklearn
中的 Normalizer
处理数据的方式非常不同。这个问题使得管道建设更加困难。我想知道这种设计差异是否是故意的。
from sklearn.preprocessing import StandardScaler, Normalizer, MinMaxScaler
对于Normalizer
,读取数据"horizontally"。
Normalizer(norm = 'max').fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0.1 , 0.1 , 0.2 , 1. ],
# [ 0.02 , 0. , 0. , 1. ],
# [ 0. , -0.001, -0.001, 1. ]])
对于StandardScaler
和MinMaxScaler
,读取数据"vertically"。
StandardScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[ 0. , 1.22474487, 1.33630621, -0.80538727],
# [ 1.22474487, 0. , -0.26726124, -0.60404045],
# [-1.22474487, -1.22474487, -1.06904497, 1.40942772]])
MinMaxScaler().fit_transform([[ 1., 1., 2., 10],
[ 2., 0., 0., 100],
[ 0., -1., -1., 1000]])
#array([[0.5 , 1. , 1. , 0. ],
# [1. , 0.5 , 0.33333333, 0.09090909],
# [0. , 0. , 0. , 1. ]])
这是预期的行为,因为 StandardScaler
和 Normalizer
有不同的用途。 StandardScaler
有效 'vertically',因为它...
Standardize[s] features by removing the mean and scaling to unit variance
[...] Centering and scaling happen independently on each feature by computing the relevant statistics on the samples in the training set. Mean and standard deviation are then stored to be used on later data using the transform method.
而 Normalizer
有效 'horizontally',因为它...
Normalize[s] samples individually to unit norm.
Each sample (i.e. each row of the data matrix) with at least one non zero component is rescaled independently of other samples so that its norm (l1 or l2) equals one.
请查看 scikit-learn 文档(上面的链接),以获得更深入的了解,这会更好地满足您的目的。