计算 Monte Carlo 模拟的确定性
Calculate certainty of Monte Carlo simulation
假设我们使用 Monte Carlo 方法来估计物体的面积,与 use it to estimate the value of π.
完全相同
现在,假设我们要计算模拟结果的确定性。我们投射了 n 个样本,其中 m 个样本落在物体内部,因此物体的面积约为 m/n 的总采样面积。我们想发表这样的声明:
"我们99%确定物体的面积在a1和a2之间。 "
我们如何计算上面的a1和a2(给定n,m,总面积,以及想要的确定性)?
这是一个程序,它试图用数字来估计这个界限。这里的样本是 [0,1) 中的点,对象是片段 [0.25,0.75)。它打印 a1 和 a2 50%、90% 和 99%,样本计数范围:
import std.algorithm;
import std.random;
import std.range;
import std.stdio;
void main()
{
foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
{
auto samples = new double[numSamples];
enum objectStart = 0.25;
enum objectEnd = 0.75;
enum numTotalSamples = 10_000_000;
auto numSizes = numTotalSamples / numSamples;
auto sizes = new double[numSizes];
foreach (ref size; sizes)
{
size_t numHits;
foreach (i; 0 .. numSamples)
{
auto sample = uniform01!double;
if (sample >= objectStart && sample < objectEnd)
numHits++;
}
size = 1.0 / numSamples * numHits;
}
sizes.sort;
writef("%d samples:", numSamples);
foreach (certainty; [50, 90, 99])
{
auto centerDist = numSizes * certainty / 100 / 2;
auto startPos = numSizes / 2 - centerDist;
auto endPos = numSizes / 2 + centerDist;
writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
}
writeln;
}
}
(Run it online.) 它输出:
// 50% 90% 99%
100 samples: 0.47000..0.53000 0.42000..0.58000 0.37000..0.63000
200 samples: 0.47500..0.52500 0.44500..0.56000 0.41000..0.59000
300 samples: 0.48000..0.52000 0.45333..0.54667 0.42667..0.57333
400 samples: 0.48250..0.51750 0.46000..0.54250 0.43500..0.56500
500 samples: 0.48600..0.51600 0.46400..0.53800 0.44200..0.55800
600 samples: 0.48667..0.51333 0.46667..0.53333 0.44833..0.55167
700 samples: 0.48714..0.51286 0.46857..0.53143 0.45000..0.54857
800 samples: 0.48750..0.51250 0.47125..0.53000 0.45375..0.54625
900 samples: 0.48889..0.51111 0.47222..0.52667 0.45778..0.54111
1000 samples: 0.48900..0.51000 0.47400..0.52500 0.45800..0.53900
是否可以精确计算这些数字?
(上下文:我想向 btdu 添加类似“±X.Y GB,99% 确定性”)
好的,问题与语言无关,下面是如何使用 Monte-Carlo 进行误差估计的说明。
假设,你想计算积分
I = S01 f(x) dx
其中 f(x) 是简单的多项式函数
f(x) = xn
Here是计算的说明。
为此,您不仅要计算平均值,还要计算标准差。
然后,知道 Monte Carlo 误差随着样本数量的平方根倒数而下降,计算置信区间就很简单了
代码,Python3.7,Windows10 x64
import numpy as np
rng = np.random.default_rng()
N = 100000
n = 2
def f(x):
return np.power(x, n)
sample = f(rng.random(N)) # N samples of the function
m = np.mean(sample) # mean value of the sample, approaching integral value as N->∞
s = np.std(sample, ddof=1) # standard deviation with Bessel correction
e = s / np.sqrt(N) # Monte Carlo error decreases as inverse square root
t = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
#t = 3.00 # For 99.7% confidence interval, we should take 3 sigma, per Gaussian distribution
print(f'True integral value is {1.0/(1.0+n)}')
print(f'Computed integral value is in the range [{m-t*e}...{m+t*e}] with 99% confidence')
会打印类似
的内容
True integral value is 0.3333333333333333
Computed integral value is in the range
[0.33141772204489295...0.3362795491124624] with 99% confidence
您可以使用 Z-score table,第 this one 行,来打印您想要的 table。您可以改变 N 以获得所需的 N 依赖性
zscore = {'50%': 0.674, '80%': 1.282, '90%': 1.645, '95%': 1.960, '98%': 2.326, '99%': 2.576, '99.7%': 3.0}
for c, z in zscore.items():
print(f'Computed integral value is in the range [{m-z*e}...{m+z*e}] with {c} confidence')
根据 Severin 的回答,这里是计算问题中所述值的代码:
def calculate_error(n, m, z):
p = m / n
std_dev = (p * (1 - p)) ** 0.5 # Standard deviation of Bernoulli variable
error = std_dev / n ** 0.5 # Monte Carlo error decreases as inverse square root
return (mean - z * error, mean + z * error)
n = 1000
z = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
print(calculate_error(n, n * 0.5, z))
假设我们使用 Monte Carlo 方法来估计物体的面积,与 use it to estimate the value of π.
完全相同现在,假设我们要计算模拟结果的确定性。我们投射了 n 个样本,其中 m 个样本落在物体内部,因此物体的面积约为 m/n 的总采样面积。我们想发表这样的声明:
"我们99%确定物体的面积在a1和a2之间。 "
我们如何计算上面的a1和a2(给定n,m,总面积,以及想要的确定性)?
这是一个程序,它试图用数字来估计这个界限。这里的样本是 [0,1) 中的点,对象是片段 [0.25,0.75)。它打印 a1 和 a2 50%、90% 和 99%,样本计数范围:
import std.algorithm;
import std.random;
import std.range;
import std.stdio;
void main()
{
foreach (numSamples; iota(0, 1000+1, 100).filter!(n => n > 0))
{
auto samples = new double[numSamples];
enum objectStart = 0.25;
enum objectEnd = 0.75;
enum numTotalSamples = 10_000_000;
auto numSizes = numTotalSamples / numSamples;
auto sizes = new double[numSizes];
foreach (ref size; sizes)
{
size_t numHits;
foreach (i; 0 .. numSamples)
{
auto sample = uniform01!double;
if (sample >= objectStart && sample < objectEnd)
numHits++;
}
size = 1.0 / numSamples * numHits;
}
sizes.sort;
writef("%d samples:", numSamples);
foreach (certainty; [50, 90, 99])
{
auto centerDist = numSizes * certainty / 100 / 2;
auto startPos = numSizes / 2 - centerDist;
auto endPos = numSizes / 2 + centerDist;
writef("\t%.5f..%.5f", sizes[startPos], sizes[endPos]);
}
writeln;
}
}
(Run it online.) 它输出:
// 50% 90% 99%
100 samples: 0.47000..0.53000 0.42000..0.58000 0.37000..0.63000
200 samples: 0.47500..0.52500 0.44500..0.56000 0.41000..0.59000
300 samples: 0.48000..0.52000 0.45333..0.54667 0.42667..0.57333
400 samples: 0.48250..0.51750 0.46000..0.54250 0.43500..0.56500
500 samples: 0.48600..0.51600 0.46400..0.53800 0.44200..0.55800
600 samples: 0.48667..0.51333 0.46667..0.53333 0.44833..0.55167
700 samples: 0.48714..0.51286 0.46857..0.53143 0.45000..0.54857
800 samples: 0.48750..0.51250 0.47125..0.53000 0.45375..0.54625
900 samples: 0.48889..0.51111 0.47222..0.52667 0.45778..0.54111
1000 samples: 0.48900..0.51000 0.47400..0.52500 0.45800..0.53900
是否可以精确计算这些数字?
(上下文:我想向 btdu 添加类似“±X.Y GB,99% 确定性”)
好的,问题与语言无关,下面是如何使用 Monte-Carlo 进行误差估计的说明。
假设,你想计算积分
I = S01 f(x) dx
其中 f(x) 是简单的多项式函数
f(x) = xn
Here是计算的说明。
为此,您不仅要计算平均值,还要计算标准差。
然后,知道 Monte Carlo 误差随着样本数量的平方根倒数而下降,计算置信区间就很简单了
代码,Python3.7,Windows10 x64
import numpy as np
rng = np.random.default_rng()
N = 100000
n = 2
def f(x):
return np.power(x, n)
sample = f(rng.random(N)) # N samples of the function
m = np.mean(sample) # mean value of the sample, approaching integral value as N->∞
s = np.std(sample, ddof=1) # standard deviation with Bessel correction
e = s / np.sqrt(N) # Monte Carlo error decreases as inverse square root
t = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
#t = 3.00 # For 99.7% confidence interval, we should take 3 sigma, per Gaussian distribution
print(f'True integral value is {1.0/(1.0+n)}')
print(f'Computed integral value is in the range [{m-t*e}...{m+t*e}] with 99% confidence')
会打印类似
的内容True integral value is 0.3333333333333333
Computed integral value is in the range
[0.33141772204489295...0.3362795491124624] with 99% confidence
您可以使用 Z-score table,第 this one 行,来打印您想要的 table。您可以改变 N 以获得所需的 N 依赖性
zscore = {'50%': 0.674, '80%': 1.282, '90%': 1.645, '95%': 1.960, '98%': 2.326, '99%': 2.576, '99.7%': 3.0}
for c, z in zscore.items():
print(f'Computed integral value is in the range [{m-z*e}...{m+z*e}] with {c} confidence')
根据 Severin 的回答,这里是计算问题中所述值的代码:
def calculate_error(n, m, z):
p = m / n
std_dev = (p * (1 - p)) ** 0.5 # Standard deviation of Bernoulli variable
error = std_dev / n ** 0.5 # Monte Carlo error decreases as inverse square root
return (mean - z * error, mean + z * error)
n = 1000
z = 2.576 # For 99% confidence interval, we should take 2.58 sigma, per Gaussian distribution
print(calculate_error(n, n * 0.5, z))