模拟 R 中正态分布随机变量的平均值

Question

我正在尝试在 R 中模拟一些数据，以检查我手动计算的方差如何在一个简单模型中发生变化，该模型涉及对一系列正态分布的随机变量进行平均。但是，我发现我得到的结果不仅与我的手动计算不一致，而且彼此之间也不一致。显然我做错了什么，但我无法找出问题所在。

从概念上讲，该模型涉及两个步骤：首先，存储变量，其次，使用存储的变量产生输出。然后将输出存储为新变量，为未来的输出做出贡献，依此类推。我假设存储是有噪声的（即，存储的是随机变量而不是常量），但在输出生成中不会添加更多噪声，这只涉及对现有存储变量进行平均。因此，我的模型涉及以下步骤，其中 V_i 是第 i 步存储的变量，O_i 是第 i 个输出：

等等。

我尝试用两种方式在 R 中对此进行模拟：首先，

nSamples <- 100000
o1 <- rnorm(nSamples) # First output
o2 <- rowMeans(cbind(rnorm(nSamples, mean=o1),rnorm(nSamples))) # Second output, averaged from first two stored variables.
o3 <- rowMeans(cbind(rnorm(nSamples, mean=o2),rnorm(nSamples, mean=o1),rnorm(nSamples))) # Third output, averaged from first three stored variables.

这给了我

var(o1) # Approximately 1, as per my manual calculations.
var(o2) # Approximately .75, as per my manual calculations.
var(o3) # Approximately .64, where my manual calculations give 19/36 or approximately .528.

最初，我相信代码并认为我的计算是错误的。然后，我尝试了以下替代代码，它更明确地遵循了我手动使用的步骤：

nSamples <- 100000
initialValue <- 0
v1 <- rnorm(nSamples, initialValue)
o1 <- v1
v2 <- rnorm(nSamples, o1)
o2 <- rowMeans(cbind(v1, v2))
v3 <- rnorm(nSamples, o2)
o3 <- rowMeans(cbind(v1, v2, v3))

这给了我

var(o1) # Approximately 1, as per my calculations.
var(o2) # Approximately 1.25, where my calculations give .75.
var(o3) # Approximately 1.36, where my calculations give approximately .528.

因此，显然我在使用这三种方法中的至少两种时做错了什么，但我无法确定问题的根源。我缺少什么导致我的代码表现得与我预期的不同？导致方差一个减少另一个增加的两个代码示例之间有什么区别？

Answer 1

您的正确计算是第一个，您在平均时生成 new 正常随机变量的实现，而不是使用上一步中生成的实现。

事实上，O2的分布假设被平均的两个正态随机变量是相互独立的。

在你的第二次计算中，这不是真的，因为你计算的是 v1 和 v2，它们不是独立的，因为它们都依赖于 o1。这就是为什么在第二种情况下你会得到更大的差异。

模拟 R 中正态分布随机变量的平均值

Simulating averages of normally-distributed random variables in R

statistics

r

normal-distribution