当 x 和 P(x) 已知时如何计算 Python 中的标准偏差
How to calculate Standard Deviation in Python when x and P(x) are known
我有以下数据,我想计算标准偏差。给出了x的值和x的概率。
x
P(x)
-2,000
0.1
-1,000
0.1
0
0.2
1000
0.2
2000
0.3
3000
0.1
我知道如何手动计算:通过 E[x^2] - (E[x])^2
计算 Var(x)
,然后取 Sqrt(Var(x))
。
[这是手动完成的方式]
在python中如何计算?
试试这个:
import math
df['x_squared'] = df['x']**2
df['E_of_x_squared'] = df['x_squared'] * df['P(x)']
df['E_of_x'] = df['x'] * df['P(x)']
sum_E_x_square = df['E_of_x_squared'].values.sum()
square_of_E_x_sum = df['E_of_x'].values.sum()**2
var = sum_E_x_square - square_of_E_x_sum
std_dev = math.sqrt(var)
print('Standard Deviation is: ' + str(std_dev))
澄清一下,如果假设所有 6 个项都具有 相等 概率分布,[1000, 2000, 3000, 0, -1000, -2000] 的标准差确实是 1707.8。
然而在post中,这6个词的概率分布不等[0.1, 0.1, 0.2, 0.2, 0.3, 0.1]
df = pd.DataFrame([
{'x':-2000, 'P(x)':0.1},
{'x':-1000, 'P(x)':0.1},
{'x':0, 'P(x)':0.2},
{'x':1000, 'P(x)':0.2},
{'x':2000, 'P(x)':0.3},
{'x':3000, 'P(x)':0.1} ])
df['E(x)'] = df['x'] * df['P(x)'] # E(x) = x . P(x)
df['E(x^2)'] = df['x']**2 * df['P(x)'] # E(x^2) = x^2 . P(x)
variance = df['E(x^2)'].sum() - df['E(x)'].sum() **2
std_dev = variance **0.5
display(df)
print('Standard Deviation is: {:.2f}'.format(std_dev))
输出
x P(x) E(x) E(x^2)
0 -2000 0.1 -200.0 400000.0
1 -1000 0.1 -100.0 100000.0
2 0 0.2 0.0 0.0
3 1000 0.2 200.0 200000.0
4 2000 0.3 600.0 1200000.0
5 3000 0.1 300.0 900000.0
Standard Deviation is: 1469.69
要确认,您可以去https://www.rapidtables.com/calc/math/standard-deviation-calculator.html
我有以下数据,我想计算标准偏差。给出了x的值和x的概率。
x | P(x) |
---|---|
-2,000 | 0.1 |
-1,000 | 0.1 |
0 | 0.2 |
1000 | 0.2 |
2000 | 0.3 |
3000 | 0.1 |
我知道如何手动计算:通过 E[x^2] - (E[x])^2
计算 Var(x)
,然后取 Sqrt(Var(x))
。
[这是手动完成的方式]
在python中如何计算?
试试这个:
import math
df['x_squared'] = df['x']**2
df['E_of_x_squared'] = df['x_squared'] * df['P(x)']
df['E_of_x'] = df['x'] * df['P(x)']
sum_E_x_square = df['E_of_x_squared'].values.sum()
square_of_E_x_sum = df['E_of_x'].values.sum()**2
var = sum_E_x_square - square_of_E_x_sum
std_dev = math.sqrt(var)
print('Standard Deviation is: ' + str(std_dev))
澄清一下,如果假设所有 6 个项都具有 相等 概率分布,[1000, 2000, 3000, 0, -1000, -2000] 的标准差确实是 1707.8。
然而在post中,这6个词的概率分布不等[0.1, 0.1, 0.2, 0.2, 0.3, 0.1]
df = pd.DataFrame([
{'x':-2000, 'P(x)':0.1},
{'x':-1000, 'P(x)':0.1},
{'x':0, 'P(x)':0.2},
{'x':1000, 'P(x)':0.2},
{'x':2000, 'P(x)':0.3},
{'x':3000, 'P(x)':0.1} ])
df['E(x)'] = df['x'] * df['P(x)'] # E(x) = x . P(x)
df['E(x^2)'] = df['x']**2 * df['P(x)'] # E(x^2) = x^2 . P(x)
variance = df['E(x^2)'].sum() - df['E(x)'].sum() **2
std_dev = variance **0.5
display(df)
print('Standard Deviation is: {:.2f}'.format(std_dev))
输出
x P(x) E(x) E(x^2)
0 -2000 0.1 -200.0 400000.0
1 -1000 0.1 -100.0 100000.0
2 0 0.2 0.0 0.0
3 1000 0.2 200.0 200000.0
4 2000 0.3 600.0 1200000.0
5 3000 0.1 300.0 900000.0
Standard Deviation is: 1469.69
要确认,您可以去https://www.rapidtables.com/calc/math/standard-deviation-calculator.html