计算 Monte Carlo 模拟的确定性

Calculate certainty of Monte Carlo simulation

假设我们使用 Monte Carlo 方法来估计物体的面积,与 use it to estimate the value of π.

完全相同

现在,假设我们要计算模拟结果的确定性。我们投射了 n 个样本,其中 m 个样本落在物体内部,因此物体的面积约为 m/n 的总采样面积。我们想发表这样的声明:

"我们99%确定物体的面积在a1和a2之间。 "

我们如何计算上面的a1和a2(给定nm,总面积,以及想要的确定性)?

这是一个程序,它试图用数字来估计这个界限。这里的样本是 [0,1) 中的点,对象是片段 [0.25,0.75)。它打印 a1 和 a2 50%、90% 和 99%,样本计数范围:

import std.algorithm;
import std.random;
import std.range;
import std.stdio;

void main()
{
    foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
    {
        auto samples = new double[numSamples];
        enum objectStart = 0.25;
        enum objectEnd   = 0.75;

        enum numTotalSamples = 10_000_000;
        auto numSizes = numTotalSamples / numSamples;
        auto sizes = new double[numSizes];
        foreach (ref size; sizes)
        {
            size_t numHits;
            foreach (i; 0 .. numSamples)
            {
                auto sample = uniform01!double;
                if (sample >= objectStart && sample < objectEnd)
                    numHits++;
            }

            size = 1.0 / numSamples * numHits;
        }

        sizes.sort;
        writef("%d samples:", numSamples);
        foreach (certainty; [50, 90, 99])
        {
            auto centerDist = numSizes * certainty / 100 / 2;
            auto startPos = numSizes / 2 - centerDist;
            auto endPos   = numSizes / 2 + centerDist;
            writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
        }
        writeln;
    }
}

(Run it online.) 它输出:

//                     50%                 90%                 99%
100 samples:    0.47000..0.53000    0.42000..0.58000    0.37000..0.63000
200 samples:    0.47500..0.52500    0.44500..0.56000    0.41000..0.59000
300 samples:    0.48000..0.52000    0.45333..0.54667    0.42667..0.57333
400 samples:    0.48250..0.51750    0.46000..0.54250    0.43500..0.56500
500 samples:    0.48600..0.51600    0.46400..0.53800    0.44200..0.55800
600 samples:    0.48667..0.51333    0.46667..0.53333    0.44833..0.55167
700 samples:    0.48714..0.51286    0.46857..0.53143    0.45000..0.54857
800 samples:    0.48750..0.51250    0.47125..0.53000    0.45375..0.54625
900 samples:    0.48889..0.51111    0.47222..0.52667    0.45778..0.54111
1000 samples:   0.48900..0.51000    0.47400..0.52500    0.45800..0.53900

是否可以精确计算这些数字?

(上下文:我想向 btdu 添加类似“±X.Y GB,99% 确定性”)

好的,问题与语言无关,下面是如何使用 Monte-Carlo 进行误差估计的说明。

假设,你想计算积分

I = S01 f(x) dx

其中 f(x) 是简单的多项式函数

f(x) = xn

Here是计算的说明。

为此,您不仅要计算平均值,还要计算标准差。

然后,知道 Monte Carlo 误差随着样本数量的平方根倒数而下降,计算置信区间就很简单了

代码,Python3.7,Windows10 x64

import numpy as np

rng = np.random.default_rng()

N = 100000

n = 2

def f(x):
    return np.power(x, n)

sample = f(rng.random(N)) # N samples of the function

m = np.mean(sample)        # mean value of the sample, approaching integral value as N->∞
s = np.std(sample, ddof=1) # standard deviation with Bessel correction

e = s / np.sqrt(N) # Monte Carlo error decreases as inverse square root

t = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
#t = 3.00 # For 99.7% confidence interval, we should take 3 sigma, per Gaussian distribution

print(f'True integral value is {1.0/(1.0+n)}')
print(f'Computed integral value is in the range [{m-t*e}...{m+t*e}] with 99% confidence')

会打印类似

的内容
True integral value is 0.3333333333333333
Computed integral value is in the range 
[0.33141772204489295...0.3362795491124624] with 99% confidence

您可以使用 Z-score table,第 this one 行,来打印您想要的 table。您可以改变 N 以获得所需的 N 依赖性

zscore = {'50%': 0.674, '80%': 1.282, '90%': 1.645, '95%': 1.960, '98%': 2.326, '99%': 2.576, '99.7%': 3.0}
for c, z in zscore.items():
    print(f'Computed integral value is in the range [{m-z*e}...{m+z*e}] with {c} confidence')

根据 Severin 的回答,这里是计算问题中所述值的代码:

def calculate_error(n, m, z):
    p = m / n
    std_dev = (p * (1 - p)) ** 0.5 # Standard deviation of Bernoulli variable

    error = std_dev / n ** 0.5 # Monte Carlo error decreases as inverse square root
    return (mean - z * error, mean + z * error)

n = 1000
z = 2.576  # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
print(calculate_error(n, n * 0.5, z))