大 n (>100) 的掷骰子数学
dice roll math with large n (>100)
我保证这不是只是另一个掷骰子作业问题。我实现了一个函数来计算在滚动 n
m
面骰子时获得小于总和 s
的概率。我的函数适用于 n
的小值,但我发现 n
的大值会产生奇怪的结果。见附图。任何人都知道发生了什么事?
我的概率函数
由此实现 math stack exchange
probability <- function(s, m, n) {
i <- 0:((s-1-n) / m)
m^(-n) * sum((-1)^i * choose(n, i) * choose(s - 1 - i * m, n))
}
开始打破 ~ n > 80
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
p <- mapply(probability, s = s, m = m, n = n)
plot(n, p, main = paste("probability of rolling less than floor(", mean(1:m),"* n) as sum of n rolls of a", m, "sided dice"))
问题是由 R 的数值精度限制引起的。正如评论者指出的那样,我在上面计算的 n choose k 值真的非常大 (choose(80,40) = 1.075072e+23
)。
我们可以使用日志来尝试将问题控制在 R 的计算限制内。这是Ramanujan 方法的实现。不幸的是,近似复合中的误差和精度衰减得更快。概率函数需要对一列非常大的数进行加减运算,得到0到1之间的最终值,不能容忍任何不精确。
0) 重写概率函数被分解成步骤
probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- choose(n, i)
c2 <- choose(s - 1 - i * m, n)
seq <- (-1)^i * (c1 * c2)
m^(-n) * sum(seq)
}
1) 实现 log(x!)
的近似
# using the 'ramanujan' method
ramanujan <- function(n){
n * log(n) - n + log(n * (1 + 4*n * (1 + 2*n))) / 6 + log(pi) / 2
}
# confirm Ramanujan works correctly
n <- 1:200
diff <- log(factorial(n)) - ramanujan(n)
plot(n, diff) # r returns inf for factorial(171), but up to there the numbers match
2) 使用对数近似重写 choose
函数。
#' This function returns log(choose(n,k))
log_nck <- Vectorize(function(n, k) {
if(n <= k | n < 1 | k < 1) return(log(choose(n,k))) # logs don't like 0 or neg numbers
return((ramanujan(n) - ramanujan(k) - ramanujan(n-k)))
})
# Check that choose function works
n <- seq(10, 100, 10)
k <- seq(5, 50, 5)
c_real <- log(choose(n, k))
c_approx <- log_nck(n, k)
# If we print them, they appear to match
print(c_real)
print(c_approx)
# and the difference shows pretty small errors.
print(c_real - c_approx)
3) 使用对数选择重写概率函数。
new_probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- log_nck(n, i)
c2 <- log_nck(s - 1 - i * m, n)
seq <- (-1)^i * exp(c1 + c2)
return(m^(-n) * sum(seq))
}
最终测试
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
p <- mapply(probability, s = s, m = m, n = n)
newp <- mapply(new_probability, s = s, m = m, n = n)
plot(n, p, main = "Original in black, approximation in red")
points(n, newp, col = "red")
正如在对原始问题的评论中提到的,问题是概率函数要求 R 计算非常大的数字 (choose(80,40) = 1.075072e+23
),而我们正在达到 R 的数值精度限制。
另一种不涉及大量数字但使用大量数字的替代方法是 运行 monte carlo 模拟。这会生成骰子总和的分布,并将观察到的总和与分布进行比较。到 运行 会花费更长的时间,但更容易做到,并且不会出现数值精度问题。
mc <- Vectorize(function(s, m, n, reps = 10000) {
x <- replicate(reps, sum(sample(m, n, replace = TRUE)))
ecdf(x)(s-1)
})
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
analytic_prob <- mapply(probability, s = s, m = m, n = n)
mc_prob <- mapply(mc, s = s, m = m, n = n)
plot(n, analytic_prob, main = paste("probability of rolling less than floor(", mean(1:m),"* n) as sum of n rolls of a", m, "sided dice"),
sub = "monte carlo in red")
points(n, mc_prob, col = "red")
我保证这不是只是另一个掷骰子作业问题。我实现了一个函数来计算在滚动 n
m
面骰子时获得小于总和 s
的概率。我的函数适用于 n
的小值,但我发现 n
的大值会产生奇怪的结果。见附图。任何人都知道发生了什么事?
我的概率函数
由此实现 math stack exchange
probability <- function(s, m, n) {
i <- 0:((s-1-n) / m)
m^(-n) * sum((-1)^i * choose(n, i) * choose(s - 1 - i * m, n))
}
开始打破 ~ n > 80
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
p <- mapply(probability, s = s, m = m, n = n)
plot(n, p, main = paste("probability of rolling less than floor(", mean(1:m),"* n) as sum of n rolls of a", m, "sided dice"))
问题是由 R 的数值精度限制引起的。正如评论者指出的那样,我在上面计算的 n choose k 值真的非常大 (choose(80,40) = 1.075072e+23
)。
我们可以使用日志来尝试将问题控制在 R 的计算限制内。这是Ramanujan 方法的实现。不幸的是,近似复合中的误差和精度衰减得更快。概率函数需要对一列非常大的数进行加减运算,得到0到1之间的最终值,不能容忍任何不精确。
0) 重写概率函数被分解成步骤
probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- choose(n, i)
c2 <- choose(s - 1 - i * m, n)
seq <- (-1)^i * (c1 * c2)
m^(-n) * sum(seq)
}
1) 实现 log(x!)
的近似# using the 'ramanujan' method
ramanujan <- function(n){
n * log(n) - n + log(n * (1 + 4*n * (1 + 2*n))) / 6 + log(pi) / 2
}
# confirm Ramanujan works correctly
n <- 1:200
diff <- log(factorial(n)) - ramanujan(n)
plot(n, diff) # r returns inf for factorial(171), but up to there the numbers match
2) 使用对数近似重写 choose
函数。
#' This function returns log(choose(n,k))
log_nck <- Vectorize(function(n, k) {
if(n <= k | n < 1 | k < 1) return(log(choose(n,k))) # logs don't like 0 or neg numbers
return((ramanujan(n) - ramanujan(k) - ramanujan(n-k)))
})
# Check that choose function works
n <- seq(10, 100, 10)
k <- seq(5, 50, 5)
c_real <- log(choose(n, k))
c_approx <- log_nck(n, k)
# If we print them, they appear to match
print(c_real)
print(c_approx)
# and the difference shows pretty small errors.
print(c_real - c_approx)
3) 使用对数选择重写概率函数。
new_probability <- function(s, m, n) {
# Probability of getting less than s
i <- 0:((s-1-n) / m)
c1 <- log_nck(n, i)
c2 <- log_nck(s - 1 - i * m, n)
seq <- (-1)^i * exp(c1 + c2)
return(m^(-n) * sum(seq))
}
最终测试
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
p <- mapply(probability, s = s, m = m, n = n)
newp <- mapply(new_probability, s = s, m = m, n = n)
plot(n, p, main = "Original in black, approximation in red")
points(n, newp, col = "red")
正如在对原始问题的评论中提到的,问题是概率函数要求 R 计算非常大的数字 (choose(80,40) = 1.075072e+23
),而我们正在达到 R 的数值精度限制。
另一种不涉及大量数字但使用大量数字的替代方法是 运行 monte carlo 模拟。这会生成骰子总和的分布,并将观察到的总和与分布进行比较。到 运行 会花费更长的时间,但更容易做到,并且不会出现数值精度问题。
mc <- Vectorize(function(s, m, n, reps = 10000) {
x <- replicate(reps, sum(sample(m, n, replace = TRUE)))
ecdf(x)(s-1)
})
n <- 1:90 # number of dice
m <- 6 # number of sides
s <- floor(mean(1:m)*n) # sum of faces
analytic_prob <- mapply(probability, s = s, m = m, n = n)
mc_prob <- mapply(mc, s = s, m = m, n = n)
plot(n, analytic_prob, main = paste("probability of rolling less than floor(", mean(1:m),"* n) as sum of n rolls of a", m, "sided dice"),
sub = "monte carlo in red")
points(n, mc_prob, col = "red")