为每个样本长度生成随机样本并计算均值问题的标准差

Question

我正在尝试回答这个问题：

Assume that a sample is created from a standard normal distribution (μ= 0,σ= 1). Take sample lengths ranging from N = 1 to 600. For each sample length, draw 5000 samples and estimate the mean from each of the samples. Find the standard deviation from these means, and show that the standard deviation corresponds to a square root reduction.

我不确定我是否正确地解释了这个问题，但我的目标是找到每个样本长度的平均值的标准偏差，然后证明标准偏差的减少类似于平方根减少:

这是我目前所知道的（我正在做的事情对这个问题有意义吗？）：

先做正态分布，画个简单的供参考：

import math
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
from scipy.stats import norm, kurtosis, skew
from scipy import stats

n = np.arange(1,401,1)

mu = 0

sigma = 1
x = np.linspace(mu - 4*sigma, mu + 4*sigma, 100)
pdf = stats.norm.pdf(x, mu, sigma)

# plot normal distribution
plt.plot(x,pdf)
plt.show()

现在用于样本长度等并计算 sdev 和均值：

sample_means = []
sample_stdevs = []

for i in range(400):
    rand_list = np.random.randint(1,400,1000) #samples ranging from values 1 - 400, and make a 1000 of them
    sample_means.append(np.mean(rand_list))
    
    sample_stdevs.append(np.std(sample_means))


plt.plot(sample_stdevs)

这有意义吗？...我也对根减少部分感到困惑。

Answer 1

Take sample lengths ranging from N = 1 to 400. For each sample length, draw 1000 samples and estimate the mean from each of the samples.

A sample of length 200 表示绘制200个样本点。取它的意思。现在对 N = 200 执行此操作 1000 次，您就有 1000 种方法。计算这 1000 个均值的标准差，它会告诉您这些均值的分布。对所有 N 执行此操作以查看不同样本长度的分布如何变化。

这个想法是，如果您只抽取 5 个样本，它们的平均值很可能不会很好地接近 0。如果您收集了 1000 个这些平均值，它们将有很大的差异，并且您会得到广泛的分布。如果您收集更大的样本，由于大数定律，平均值将非常接近于 0，即使您这样做 1000 次，这也将是可重现的。因此，这些手段的传播会更小。

均值的标准差是总体的标准差（在我们的例子中σ = 1）除以我们抽取的样本大小的平方根。请参阅 wiki article 进行推导。

import numpy as np
import matplotlib.pyplot as plt

stdevs = []
lengths = np.arange(1, 401)

for length in lengths:
    # mean = 0, std = 1 by default
    sample = np.random.normal(size=(length, 1000))
    stdevs.append(sample.mean(axis=0).std())

plt.plot(lengths, stdevs)
plt.plot(lengths, 1 / np.sqrt(lengths))
plt.legend(['Sampling', 'Theory'])
plt.show()

输出

为每个样本长度生成随机样本并计算均值问题的标准差

Generating random samples for each sample length and calculate the standard deviation from the means question

python

numpy

normal-distribution

standard-deviation

statsmodels