应如何缩放列表的值以满足标准偏差和均值要求？

Question

我有一些值列表，我想对其进行缩放以满足特定的标准差和均值要求。具体来说，我希望数据集标准化为 0，标准差为 1，所有值都大于 0 的数据集除外；我想对这些进行缩放，使它们的平均值为 1。

在 Python 中做这种事情的好方法是什么？

Answer 1

如果您在 Python 中处理数据，您将需要使用科学堆栈（参见 here), in particular numpy, scipy, and pandas. What you're looking for is the zscore, and that's a common enough operation that it's built-in to scipy as scipy.stats.zscore。

从具有 non-zero 均值和 non-unity 标准差的随机数组开始：

>>> import numpy as np
>>> import scipy.stats
>>> data = np.random.uniform(0, 100, 10**5)
>>> data.mean(), data.std()
(49.950550280158893, 28.910154760235972)

我们可以重新归一化：

>>> renormed = scipy.stats.zscore(data)
>>> renormed.mean(), renormed.std()
(2.0925483568134951e-16, 1.0)

如果需要，可以移动：

>>> if (data > 0).all():
...     renormed += 1
...     
>>> renormed.mean(), renormed.std()
(1.0000000000000002, 1.0)

我们当然可以手动完成此操作：

>>> (data - data.mean())/data.std()
array([-0.65558504,  0.24264144, -0.1112242 , ..., -0.40785103,
       -0.52998332,  0.10104563])

（请注意，默认情况下，这会使用 delta 自由度 0，即分母为 N。如果要 N-1，请传递 ddof=1）。

应如何缩放列表的值以满足标准偏差和均值要求？

How should the values of a list be scaled such that they meet standard deviation and mean requirements?

python

list

mean

scale

standard-deviation