计算 Monte Carlo 模拟的确定性

Question

假设我们使用 Monte Carlo 方法来估计物体的面积，与 use it to estimate the value of π.

完全相同

现在，假设我们要计算模拟结果的确定性。我们投射了 n 个样本，其中 m 个样本落在物体内部，因此物体的面积约为 m/n 的总采样面积。我们想发表这样的声明：

"我们99%确定物体的面积在a₁和a₂之间。 "

我们如何计算上面的a₁和a₂（给定n，m，总面积，以及想要的确定性）?

这是一个程序，它试图用数字来估计这个界限。这里的样本是 [0,1) 中的点，对象是片段 [0.25,0.75)。它打印 a₁ 和 a₂ 50%、90% 和 99%，样本计数范围：

import std.algorithm;
import std.random;
import std.range;
import std.stdio;

void main()
{
    foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
    {
        auto samples = new double[numSamples];
        enum objectStart = 0.25;
        enum objectEnd   = 0.75;

        enum numTotalSamples = 10_000_000;
        auto numSizes = numTotalSamples / numSamples;
        auto sizes = new double[numSizes];
        foreach (ref size; sizes)
        {
            size_t numHits;
            foreach (i; 0 .. numSamples)
            {
                auto sample = uniform01!double;
                if (sample >= objectStart && sample < objectEnd)
                    numHits++;
            }

            size = 1.0 / numSamples * numHits;
        }

        sizes.sort;
        writef("%d samples:", numSamples);
        foreach (certainty; [50, 90, 99])
        {
            auto centerDist = numSizes * certainty / 100 / 2;
            auto startPos = numSizes / 2 - centerDist;
            auto endPos   = numSizes / 2 + centerDist;
            writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
        }
        writeln;
    }
}

(Run it online.) 它输出：

//                     50%                 90%                 99%
100 samples:    0.47000..0.53000    0.42000..0.58000    0.37000..0.63000
200 samples:    0.47500..0.52500    0.44500..0.56000    0.41000..0.59000
300 samples:    0.48000..0.52000    0.45333..0.54667    0.42667..0.57333
400 samples:    0.48250..0.51750    0.46000..0.54250    0.43500..0.56500
500 samples:    0.48600..0.51600    0.46400..0.53800    0.44200..0.55800
600 samples:    0.48667..0.51333    0.46667..0.53333    0.44833..0.55167
700 samples:    0.48714..0.51286    0.46857..0.53143    0.45000..0.54857
800 samples:    0.48750..0.51250    0.47125..0.53000    0.45375..0.54625
900 samples:    0.48889..0.51111    0.47222..0.52667    0.45778..0.54111
1000 samples:   0.48900..0.51000    0.47400..0.52500    0.45800..0.53900

是否可以精确计算这些数字？

（上下文：我想向 btdu 添加类似“±X.Y GB，99% 确定性”）

Answer 1

好的，问题与语言无关，下面是如何使用 Monte-Carlo 进行误差估计的说明。

假设，你想计算积分

I = S₀¹ f(x) dx

其中 f(x) 是简单的多项式函数

f(x) = xⁿ

Here是计算的说明。

为此，您不仅要计算平均值，还要计算标准差。

然后，知道 Monte Carlo 误差随着样本数量的平方根倒数而下降，计算置信区间就很简单了

代码，Python3.7，Windows10 x64

import numpy as np

rng = np.random.default_rng()

N = 100000

n = 2

def f(x):
    return np.power(x, n)

sample = f(rng.random(N)) # N samples of the function

m = np.mean(sample)        # mean value of the sample, approaching integral value as N->∞
s = np.std(sample, ddof=1) # standard deviation with Bessel correction

e = s / np.sqrt(N) # Monte Carlo error decreases as inverse square root

t = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
#t = 3.00 # For 99.7% confidence interval, we should take 3 sigma, per Gaussian distribution

print(f'True integral value is {1.0/(1.0+n)}')
print(f'Computed integral value is in the range [{m-t*e}...{m+t*e}] with 99% confidence')

会打印类似

的内容

True integral value is 0.3333333333333333
Computed integral value is in the range 
[0.33141772204489295...0.3362795491124624] with 99% confidence

您可以使用 Z-score table，第 this one 行，来打印您想要的 table。您可以改变 N 以获得所需的 N 依赖性

zscore = {'50%': 0.674, '80%': 1.282, '90%': 1.645, '95%': 1.960, '98%': 2.326, '99%': 2.576, '99.7%': 3.0}
for c, z in zscore.items():
    print(f'Computed integral value is in the range [{m-z*e}...{m+z*e}] with {c} confidence')

Answer 2

根据 Severin 的回答，这里是计算问题中所述值的代码：

def calculate_error(n, m, z):
    p = m / n
    std_dev = (p * (1 - p)) ** 0.5 # Standard deviation of Bernoulli variable

    error = std_dev / n ** 0.5 # Monte Carlo error decreases as inverse square root
    return (mean - z * error, mean + z * error)

n = 1000
z = 2.576  # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
print(calculate_error(n, n * 0.5, z))

计算 Monte Carlo 模拟的确定性

Calculate certainty of Monte Carlo simulation

probability

montecarlo