贝叶斯统计:在R中模拟,上面描述的实验10000次,每次记下最长的长度运行

Bayesian Statistics: Simulate in R, the experiment described above 10,000 times and each time note the length of the longest run

我试图找到 10,000 次抛硬币 30 次模拟中最长 运行 的平均值。我需要在R中模拟,上面描述的实验10000次,每次记下最长的长度运行.

到目前为止,这是我的代码:

coin <- sample(c("H", "T"), 10000, replace = TRUE)
table(coin) 
head(coin, n = 30)
rle(c("H", "T", "T", "H", "H", "H", "H", "H", "T", "H"))
coin.rle <- rle(coin)
str(coin.rle)

如何在 10,000 次模拟中找到最长 运行 的平均值?

我认为以下内容符合您的要求。

n_runs <- 10000
max_runs <- numeric(n_runs)
for (j in 1:n_runs) {
 coin <- sample(c("H", "T"), 30, replace = TRUE) 
 max_runs[j] <- max(rle(coin)$length)
}
mean(max_runs)

为了解释代码,最好检查一小段 coin(如 coin[20])及其 rlerle(coin[20]))。为 运行s 的每个段计算长度,因此 max(rle(coin)$length) 给出最大值 运行。

编辑:跟随可能会更快

len <- 30
times <- 10000

flips <- sample(c("H", "T"), len * times, replace = TRUE) 
runs <- sapply(split(flips, ceiling(seq_along(flips)/len)),
                    function(x) max(rle(x)$length))
mean(runs) # average of max runs
sum(runs >= 7)/ times # number of runs >= 7

所有抛硬币都是相互独立的(即一次抛硬币的结果不影响另一抛硬币)。正因为如此,我们可以一次抛出所有模拟的所有硬币,然后以这样一种方式格式化,这样可以更简单地总结每 30 次抛掷试验。以下是我的处理方法。

# do all of the flips at once, this is okay because each flip
# is independent
coin_flips <- sample(c("heads", "tails"), 30 * 10000, replace = TRUE)

# put them into a 10000 by 30 matrix, each row
# indicates one 'simulation'
coin_matrix <- matrix(coin_flips, ncol = 30, nrow = 10000)

# we now want to iterate through each row using apply,
# to do so we need to make a function to apply to each
# row. This gets us the longest run over a single
# simulation
get_long_run <- function(x) {
  max(rle(x)$length)
}

# apply this function to each row
longest_runs <- apply(coin_matrix, 1, get_long_run)

# get the number of simulations that had a max run >= 7. Divide this
# by the number of simulations to get the probability of this occuring.
sum(longest_runs >= 7)/nrow(coin_matrix)

您应该得到 18-19% 之间的值,但每次尝试此模拟时都会有所不同。