基于顺序抽样的置​​信区间

confidence interval based on sequentional sampling

当我想要从 N(1,2) 中按顺序获取 100 个样本并按顺序估计均值和标准差时,我可以这样做:

sample = np.random.normal(1,2,(100, 1))
sample_mean = []
for i,_ in enumerate(sample):
    sample_mean.append(sample[:i+1,:].ravel().mean())
    sample_std.append(sample[:i+1,:].ravel().std())

但是如果我想计算这些连续样本的置信区间,我该怎么做呢?

您可以使用 OpenTURNS 来完成。以下代码计算置信度为 0.9 的参数(均值和标准差)的渐近双边置信区间。

import numpy as np
import openturns as ot
sample = np.random.normal(1,2,(100, 1))
confidence_level = 0.9
sample_mean_ci = []
sample_std_ci = []
nor = ot.NormalFactory() # class that fits a Normal distribution on the data
for i in range(1, len(sample)):
    estimated_params = nor.buildEstimator(sample[:i+1,:]).getParameterDistribution()
    ci = estimated_params.computeBilateralConfidenceInterval(confidence_level)
    sample_mean_ci.append([ci.getLowerBound()[0], ci.getUpperBound()[0]])
    sample_std_ci.append([ci.getLowerBound()[1], ci.getUpperBound()[1]])

可以将样本均值的 CI 计算为 sampleMean +/- tstat * std/sqrt(n)

和 CI 的样本标准偏差作为 ( (sampleSize-1)*std^2/chisq(a/2), (sampleSize-1)*std^2/chisq(1-a/2) ) 的平方根。

这是一个示例:

import numpy as np
import scipy.stats as stats
from scipy.stats import chi2

n_series = 100
sample = np.random.normal(0,1,n_series)
sample_mean, sample_std = [], []
sample_mean_CI, sample_std_CI = [], []
alpha = 0.1 # e.g. alpha = 0.1 for 90-percent CI, alpha=0.05 for 95-percent CI

def mean_div(std, n, alpha): return stats.t.isf(alpha/2, n) * (std/np.sqrt(n))
def mean_ci(xbar, std, n, alpha): 
    div = mean_div(std, n, alpha)
    return (xbar - div, xbar + div)

def std_lower(std, n, alpha): return np.sqrt(((n-1)*std**2)/chi2.isf(alpha/2, n-1))
def std_upper(std, n, alpha): return np.sqrt(((n-1)*std**2)/chi2.isf(1-alpha/2, n-1))
def std_ci(std, n, alpha): return (std_lower(std, n, alpha), std_upper(std, n, alpha))

for i,_ in enumerate(sample):
    x = sample[:i+1]
    xbar = x.mean()
    std = x.std()
    sample_mean.append(xbar)
    sample_std.append(std)
    sample_mean_CI.append(mean_ci(xbar, std, i+1, alpha))
    sample_std_CI.append(std_ci(std, i+1, alpha))
    
# check for convergence for the whole series
print(sample_mean_CI[-1])
print(sample_std_CI[-1])

可以改变数据系列的数量and/or正态分布的均值和标准参数来进行试验。