删除值后重新计算总体方差的公式

Formula to recalculate population variance after removing a value

假设我有一个数据集{10, 20, 30}。我这里的均值和方差是 mean = 20variance = 66.667。如果我要从数据集中 删除 10 将其转换为 {20, 30}?

,是否有一个公式可以让我计算新的方差值

这是一个与 https://math.stackexchange.com/questions/3112650/formula-to-recalculate-variance-after-removing-a-value-and-adding-another-one-gi which deals with the case when there is replacement. https://math.stackexchange.com/questions/775391/can-i-calculate-the-new-standard-deviation-when-adding-a-value-without-knowing-t is also a similar question except that deals with adding adding a value instead of removing one. 类似的问题,涉及删除样本,但我不知道如何修改它以处理人口。

要计算 MeanVariance 我们需要 3 个参数:

N   - number of items 
Sx  - sum of items
Sxx - sum of items squared

有了所有这些值,我们可以找到均值和方差作为

Mean     = Sx / N
Variance = Sxx / N - Sx * Sx / N / N

你的情况

items    = {10, 20, 30}

N        = 3
Sx       = 60   = 10 + 20 + 30
Sxx      = 1400 = 100 + 400 + 900 = 10 * 10 + 20 * 20 + 30 * 30  

Mean     = 60 / 3 = 20
Variance = 1400 / 3 - 60 * 60 / 3 / 3 = 66.666667  

如果要删除 item,只需 更新 N, Sx, Sxx 值并计算新方差:

item      = 10

N'        = N - 1             = 3 - 1 = 2
Sx'       = Sx - item         = 60 - 10 = 50
Sxx'      = Sxx - item * item = 1400 - 10 * 10 = 1300

Mean'     = Sx' / N' = 50 / 2 = 25
Variance' = Sxx' / N' - Sx' * Sx' / N' / N' = 1300 / 2 - 50 * 50 / 2 / 2 = 25

因此,如果您删除 item = 10,新的均值和方差将为

Mean'     = 25
Variance' = 25