如何在不使用 numpy 的情况下计算 python 中的标准偏差?
How do I calculate standard deviation in python without using numpy?
我正在尝试在不使用 numpy
或除 math
之外的任何外部库的情况下计算 python 中的标准偏差。我想在编写算法方面做得更好,并且在提高我的 python 技能时,我只是把它作为一点“家庭作业”来做。我的目标是将 this formula 翻译成 python 但我没有得到正确的结果。
我正在使用一系列速度,其中 speeds = [86,87,88,86,87,85,86]
当我运行:
std_dev = numpy.std(speeds)
print(std_dev)
我得到:0.903507902905。但我不想依赖 numpy。所以...
我的实现如下:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum + i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
现在当我运行:
std_dev = get_std_dev(speeds)
print(std_dev)
我得到:[0]
但我期待 0.903507902905
我在这里错过了什么?
这个。你需要去掉 for 循环中的 return
。
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
代码中的一些问题,其中之一是 for 语句中的 return 值。你可以试试这个
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)
如果您不想使用 numpy
也可以尝试 statistics
包 python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
或者,如果您仍然愿意使用自定义解决方案,那么我建议您使用以下使用列表理解的方式,而不是使用复杂的错误方法
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)
speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513
您的代码中的问题是在循环中间重用数组和 return
def get_std_dev(array):
# get mu
mean = get_mean(array) <-- this is 86.4
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2 <-- this is almost 0
return array <-- this is the value returned
现在让我们看看您使用的算法。请注意,有两个常用的标准偏差公式。究竟哪一个是正确的,众说纷纭。
sqrt(sum((x - mean)^2) / n)
或
sqrt(sum((x - mean)^2) / (n -1))
对于较大的 n 值,使用第一个公式,因为 -1 是微不足道的。第一个公式可以简化为
sqrt(sum(x^2) /n - mean^2)
那么在 python 中你会怎么做呢?
def std_dev1(array):
n = len(array)
mean = sum(array) / n
sumsq = sum(v * v for v in array)
return (sumsq / n - mean * mean) ** 0.5
我正在尝试在不使用 numpy
或除 math
之外的任何外部库的情况下计算 python 中的标准偏差。我想在编写算法方面做得更好,并且在提高我的 python 技能时,我只是把它作为一点“家庭作业”来做。我的目标是将 this formula 翻译成 python 但我没有得到正确的结果。
我正在使用一系列速度,其中 speeds = [86,87,88,86,87,85,86]
当我运行:
std_dev = numpy.std(speeds)
print(std_dev)
我得到:0.903507902905。但我不想依赖 numpy。所以...
我的实现如下:
import math
speeds = [86,87,88,86,87,85,86]
def get_mean(array):
sum = 0
for i in array:
sum = sum + i
mean = sum/len(array)
return mean
def get_std_dev(array):
# get mu
mean = get_mean(array)
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2
return array
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + i
return sum_sqr_diff
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
std_dev = get_std_dev(speeds)
print(std_dev)
现在当我运行:
std_dev = get_std_dev(speeds)
print(std_dev)
我得到:[0]
但我期待 0.903507902905
我在这里错过了什么?
这个。你需要去掉 for 循环中的 return
。
def get_std_dev(array):
# get mu
mean = get_mean(array)
sum_sqr_diff = 0
# get sigma
for i in array:
sum_sqr_diff = sum_sqr_diff + (i - mean)**2
# get mean of squared differences
variance = 1/len(array)
mean_sqr_diff = (variance * sum_sqr_diff)
std_dev = math.sqrt(mean_sqr_diff)
return std_dev
代码中的一些问题,其中之一是 for 语句中的 return 值。你可以试试这个
def get_mean(array):
return sum(array) / len(array)
def get_std_dev(array):
n = len(array)
mean = get_mean(array)
squares_arr = []
for item in array:
squares_arr.append((item - mean) ** 2)
return math.sqrt(sum(squares_arr) / n)
如果您不想使用 numpy
也可以尝试 statistics
包 python
import statistics
st_dev = statistics.pstdev(speeds)
print(st_dev)
或者,如果您仍然愿意使用自定义解决方案,那么我建议您使用以下使用列表理解的方式,而不是使用复杂的错误方法
import math
mean = sum(speeds) / len(speeds)
var = sum((l-mean)**2 for l in speeds) / len(speeds)
st_dev = math.sqrt(var)
print(st_dev)
speeds = [86,87,88,86,87,85,86]
# Calculate the mean of the values in your list
mean_speeds = sum(speeds) / len(speeds)
# Calculate the variance of the values in your list
# This is 1/N * sum((x - mean(X))^2)
var_speeds = sum((x - mean_speeds) ** 2 for x in speeds) / len(speeds)
# Take the square root of variance to get standard deviation
sd_speeds = var_speeds ** 0.5
>>> sd_speeds
0.9035079029052513
您的代码中的问题是在循环中间重用数组和 return
def get_std_dev(array):
# get mu
mean = get_mean(array) <-- this is 86.4
# (x[i] - mu)**2
for i in array:
array = (i - mean) ** 2 <-- this is almost 0
return array <-- this is the value returned
现在让我们看看您使用的算法。请注意,有两个常用的标准偏差公式。究竟哪一个是正确的,众说纷纭。
sqrt(sum((x - mean)^2) / n)
或
sqrt(sum((x - mean)^2) / (n -1))
对于较大的 n 值,使用第一个公式,因为 -1 是微不足道的。第一个公式可以简化为
sqrt(sum(x^2) /n - mean^2)
那么在 python 中你会怎么做呢?
def std_dev1(array):
n = len(array)
mean = sum(array) / n
sumsq = sum(v * v for v in array)
return (sumsq / n - mean * mean) ** 0.5