如何忽略列表中的高偏差

Question

有 [2.1, 2.01, 6, 2.2, 1.9] 和 [2, 7.1, 7.2, 6.9] 这样的列表 numpy（或其他库）中是否有一个函数可以删除与其他数字偏差超过 5% 的数字。在这些情况下，它将是 6 和 2。

列表大小不固定。也不是数字范围。

谢谢

Answer 1

尝试使用 scipy.stats.zscore:

from scipy.stats import zscore
a = np.array([2.1, 2.01, 6, 2.2, 1.9])
print(a[np.abs(zscore(a)) < max(a) / 5])

输出：

[ 2.1   2.01  2.2   1.9 ]

Answer 2

import numpy
data = [2.1, 2.01, 6, 2.2, 1.9]

elements = numpy.array(data)

mean = numpy.mean(elements, axis=0)
sd = numpy.std(elements, axis=0)
final_list = [x for x in data if (x > mean - 1 * sd)]
final_list = [x for x in final_list if (x < mean + 1 * sd)]
print(final_list)

[2.1, 2.01, 2.2, 1.9]

来源：https://www.kdnuggets.com/2017/02/removing-outliers-standard-deviation-python.html

如何忽略列表中的高偏差

How to ignore high deviations in a list

python

math

numpy

standard-deviation

data-science