为分布的每个样本长度生成随机样本

Generate random samples for each sample length for a distribution

我的目标是抽取 500 个样本点,取其平均值,然后从分布中执行 6000 次。基本上:

Take sample lengths ranging from N = 1 to 500. For each sample length, draw 6000 samples and estimate the mean from each of the samples. Calculate the standard deviation from these means for each sample length, and show graphically that the decrease in standard deviation corresponds to a square root reduction.

我正尝试在伽马分布上执行此操作,但我的所有标准偏差都为零...我不确定为什么。

这是目前的程序:

import math
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
from scipy.stats import gamma


# now taking random gamma samples 

stdevs = []
length = np.arange(1, 401,1)

mean=[]
for i in range(400):
    sample = np.random.gamma(shape=i,size=1000)
    mean.append(np.mean(sample))
    stdevs.append(np.std(mean))


      
   # then trying to plot the standard deviations but it's just a line.. 
   # thought there should be a decrease
    
plt.plot(length, stdevs,label='sampling') 
plt.show() 
  

我认为标准差应该减少,而不是增加。尝试从伽马分布中抽取 1000 个样本并估计均值和标准差时,我可能做错了什么?

问题出在行 stdevs.append(np.std(sample.mean(axis=0)))

这取单个值的标准偏差,即 sample 数组的平均值,因此它将始终为 0.

您需要传递 np.std() 样本中的所有值,而不仅仅是其平均值。

stdevs.append(np.std(sample)) 将为您提供每次采样的标准差数组。

我认为你误用了形状。形状是分布的形状而不是独立绘制的数量。

import numpy as np
import matplotlib.pyplot as plt

# Reproducible
gen = np.random.default_rng(20210513)

# Generate 400 (max sample size) by 1000 (number of indep samples)
sample = gen.gamma(shape=2, size=(400, 1000))
# Use cumsum to compute the cumulative sum
means = np.cumsum(sample, axis=0)
# Divid the cumsume by the number of observations used in each
# A little care needed to get broadcasting to work right
means = means / np.arange(1,401)[:,None]

# Compute the std dev using the observations in each row
stdevs = means.std(axis=1)

# Plot
plt.plot(np.arange(1,401), stdevs,label='sampling') 
plt.show() 

这会生成图片。