statistics.stdev() 和 numpy.std() 有什么区别,哪个更精确?

What is the difference between statistics.stdev() & numpy.std() and which is more precise?

我使用了这个数据集:

lst = [81922.00557103065, 82887.70053475935, 80413.01627033792,
       81708.86075949368, 82997.38219895288, 84641.50943396226,
       81929.82456140351, 82632.24181360201, 77667.98418972333,
       73726.47427854454, 86113.2075471698, 83232.98429319372,
       79866.66666666667, 83833.74689826302, 81943.06930693069,
       77898.64029666255, 77401.47783251232, 80607.59493670886,
       78384.5126835781, 82608.69565217392, 80824.8730964467,
       84163.70106761566, 74887.38738738738
       ]

statistics.stdev(lst)为3096.28,numpy.std(lst)为3028.23。差异约为 2.2%。

他们在计算两个略有不同的东西。

标准差是 variance. NumPy is using the sample variance, whereas statistics is adjusting this with Bessel's correction 的平方根。在计算方差时使用 N – 1 而不是 N:

arr = np.array(lst)
var_ordinary = np.sum(abs(arr - arr.mean())**2) / arr.size
var_bessel = np.sum(np.abs(arr - arr.mean())**2) / (arr.size - 1)

来自the statistics docs

This is the sample variance s² with Bessel’s correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance.