我在 R 中使用 Monte Carlo 方法的 F 分布仿真不适合。为什么？

Question

作为练习，我尝试用 R 中的 Monte Carlo 方法模拟两个独立方差之比的 F 分布。但我的结果明显大于应有的值。为什么？

emulateF <- function (numberOfEmulations, sampleSize1, sampleSize2){
ratioVec <- NULL
for (i in 1:numberOfEmulations) {
    sample1 <- rnorm(sampleSize1, mean = 0, sd = 9)
    sample2 <- rnorm(sampleSize2, mean = 0, sd = 9)
    ratio <- var (sample1) / var (sample2)
    if (ratio >= 1) {
        ratioVec <- c(ratioVec, ratio)
    } else {
        ratioVec <- c(ratioVec, 1/ratio)
    }
    }   
return (quantile (ratioVec, 0.975))
}

我想这个函数执行的结果emulateF (10000, 30, 30)应该和qf(0.975,29,29)很相似。但每次都高出约10%。为什么？

> qf(0.975,29,29)
[1] 2.100996

和

> for (i in 1:10) {
+ resultsVec <- c (resultsVec, emulateF (10000, 30, 30))
+ }
> resultsVec
   97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5%    97.5% 
2.311599 2.374442 2.377750 2.330585 2.300294 2.359123 2.344875 2.340269 
   97.5%    97.5% 
2.307880 2.350104 
>

如果我将 sd = 9 更改为标准 sd = 1，问题仍然存在。

Answer 1

您的代码的修复方法是删除 if 语句。您的 if 语句强制每个存储的值都大于 1。这不应该。

FWIW，这是使用 apply 而不是 for 循环的类似代码。

myF <- function(n, n1, n2) {
    samp1 <- matrix(rnorm(n1*n, mean=0, sd=9), nrow=n, ncol=n1)
    samp2 <- matrix(rnorm(n2*n, mean=0, sd=9), nrow=n, ncol=n2)
    f <- apply(samp1, 1, var) / apply(samp2, 1, var)
    return(quantile(f, 0.975))
}

set.seed(789)
myF(1e4, 30, 30)
2.09744

qf(0.975, 29, 29)
2.100996

我在 R 中使用 Monte Carlo 方法的 F 分布仿真不适合。为什么？

My F distribution emulation with Monte Carlo method In R does not fit. Why?

r

statistics

distribution

montecarlo