枚举一系列不同概率的伯努利试验的所有可能组合概率
Enumerate all possible combined probabilities of a series of Bernoulli trials with different probabilities
假设我有一系列 n 个独立伯努利试验成功的概率,p1 到 pn 使得 p1 != p2 != ... != pn。为每个试验指定一个唯一的名称。
p <- c(0.5, 0.12, 0.7, 0.8, .02)
a <- c("A","B","C","D","E")
我通过搜索堆栈交换(例如,here and here)知道我可以使用泊松二项分布函数找到 cdf、pmf 等。
我感兴趣的是每种可能的成功与失败组合的确切概率。 (例如,如果我画了一棵概率树,我想知道每个分支结束时的概率。)
all <- prod(p)
all
[1] 0.000672
o1 <- (0.5 * (1-0.12) * 0.7 * 0.8 * .02)
o1
[1] 0.004928
o2 <- (0.5 * 0.12 * (1-0.7) * 0.8 * .02)
o2
[1] 0.000288
...对于 success/failure.
的所有 2^5 种可能组合
在 R 中执行此操作的有效方法是什么?
就我的实际数据集而言,试验次数为 19,因此我们在概率树上讨论的路径总数为 2^19。
快速计算的关键是以对数概率 space 进行计算,以便树的每个分支的乘积是一个总和,可以计算为矩阵乘法的内部总和.以这种方式,所有分支都可以以矢量化的方式一起计算。
首先,我们构造一个所有分支的枚举。为此,我们使用 R.utils
包中的 intToBin
函数:
library(R.utils)
enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))
其中 n
是伯努利变量的数量。例如,n=5
:
matrix(enum.branches, nrow=n)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
##[1,] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1"
##[2,] "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "1" "1" "1" "0"
##[3,] "0" "0" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "1" "0"
##[4,] "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0"
##[5,] "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0"
## [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
##[1,] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
##[2,] "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "1" "1" "1"
##[3,] "0" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "1"
##[4,] "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1"
##[5,] "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1"
生成一个矩阵,其中每一列都是概率树分支的结果。
现在,使用它来构造一个与 enum.branches
大小相同的对数概率矩阵,其中如果 enum.branches=="1"
则值为 log(p)
,否则为 log(1-p)
。对于您的数据,p <- c(0.5, 0.12, 0.7, 0.8, .02)
,这是:
logp <- matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n)
然后,对对数概率求和并取指数得到概率的乘积:
result <- exp(rep(1,n) %*% logp)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##[1,] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
##[1,] 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872 0.000528 0.103488 0.002112
[,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
##[1,] 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05 0.014112 0.000288 0.008232 0.000168
[,31] [,32]
##[1,] 0.032928 0.000672
result
将与 enum.branches
中的分支编号顺序相同。
我们可以把计算封装成一个函数:
enum.prob.product <- function(n, p) {
enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))
exp(rep(1,n) %*% matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n))
}
用 19
个独立的伯努利变量来计时:
n <- 19
p <- runif(n)
system.time(enum.prob.product(n,p))
## user system elapsed
## 24.064 1.470 26.082
这是在我的 2 GHz MacBook(大约 2009 年)上。应该注意的是计算本身是相当快的;它是概率树分支的枚举(我猜是其中的 unlist
)占用了大部分时间。我们将不胜感激社区对其他方法的任何建议。
只需在 base R 中尝试这个:
p <- c(0.5, 0.12, 0.7, 0.8, .02)
a <- c("A","B","C","D","E")
n <- length(p)
apply(expand.grid(replicate(n,list(0:1)))[n:1], 1,
function(x) prod(p[which(x==1)])*prod(1-p[which(x==0)]))
#[1] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872
#[18] 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672
假设我有一系列 n 个独立伯努利试验成功的概率,p1 到 pn 使得 p1 != p2 != ... != pn。为每个试验指定一个唯一的名称。
p <- c(0.5, 0.12, 0.7, 0.8, .02)
a <- c("A","B","C","D","E")
我通过搜索堆栈交换(例如,here and here)知道我可以使用泊松二项分布函数找到 cdf、pmf 等。
我感兴趣的是每种可能的成功与失败组合的确切概率。 (例如,如果我画了一棵概率树,我想知道每个分支结束时的概率。)
all <- prod(p)
all
[1] 0.000672
o1 <- (0.5 * (1-0.12) * 0.7 * 0.8 * .02)
o1
[1] 0.004928
o2 <- (0.5 * 0.12 * (1-0.7) * 0.8 * .02)
o2
[1] 0.000288
...对于 success/failure.
的所有 2^5 种可能组合在 R 中执行此操作的有效方法是什么?
就我的实际数据集而言,试验次数为 19,因此我们在概率树上讨论的路径总数为 2^19。
快速计算的关键是以对数概率 space 进行计算,以便树的每个分支的乘积是一个总和,可以计算为矩阵乘法的内部总和.以这种方式,所有分支都可以以矢量化的方式一起计算。
首先,我们构造一个所有分支的枚举。为此,我们使用 R.utils
包中的 intToBin
函数:
library(R.utils)
enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))
其中 n
是伯努利变量的数量。例如,n=5
:
matrix(enum.branches, nrow=n)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,16] [,17]
##[1,] "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "0" "1"
##[2,] "0" "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "1" "1" "1" "0"
##[3,] "0" "0" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "1" "0"
##[4,] "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0"
##[5,] "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0"
## [,18] [,19] [,20] [,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32]
##[1,] "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1" "1"
##[2,] "0" "0" "0" "0" "0" "0" "0" "1" "1" "1" "1" "1" "1" "1" "1"
##[3,] "0" "0" "0" "1" "1" "1" "1" "0" "0" "0" "0" "1" "1" "1" "1"
##[4,] "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1" "0" "0" "1" "1"
##[5,] "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1" "0" "1"
生成一个矩阵,其中每一列都是概率树分支的结果。
现在,使用它来构造一个与 enum.branches
大小相同的对数概率矩阵,其中如果 enum.branches=="1"
则值为 log(p)
,否则为 log(1-p)
。对于您的数据,p <- c(0.5, 0.12, 0.7, 0.8, .02)
,这是:
logp <- matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n)
然后,对对数概率求和并取指数得到概率的乘积:
result <- exp(rep(1,n) %*% logp)
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##[1,] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05
[,11] [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20]
##[1,] 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872 0.000528 0.103488 0.002112
[,21] [,22] [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30]
##[1,] 0.060368 0.001232 0.241472 0.004928 0.003528 7.2e-05 0.014112 0.000288 0.008232 0.000168
[,31] [,32]
##[1,] 0.032928 0.000672
result
将与 enum.branches
中的分支编号顺序相同。
我们可以把计算封装成一个函数:
enum.prob.product <- function(n, p) {
enum.branches <- unlist(strsplit(intToBin(seq_len(2^n)-1),split=""))
exp(rep(1,n) %*% matrix(ifelse(enum.branches == "1", rep(log(p), 2^n), rep(log(1-p), 2^n)), nrow=n))
}
用 19
个独立的伯努利变量来计时:
n <- 19
p <- runif(n)
system.time(enum.prob.product(n,p))
## user system elapsed
## 24.064 1.470 26.082
这是在我的 2 GHz MacBook(大约 2009 年)上。应该注意的是计算本身是相当快的;它是概率树分支的枚举(我猜是其中的 unlist
)占用了大部分时间。我们将不胜感激社区对其他方法的任何建议。
只需在 base R 中尝试这个:
p <- c(0.5, 0.12, 0.7, 0.8, .02)
a <- c("A","B","C","D","E")
n <- length(p)
apply(expand.grid(replicate(n,list(0:1)))[n:1], 1,
function(x) prod(p[which(x==1)])*prod(1-p[which(x==0)]))
#[1] 0.025872 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672 0.025872
#[18] 0.000528 0.103488 0.002112 0.060368 0.001232 0.241472 0.004928 0.003528 0.000072 0.014112 0.000288 0.008232 0.000168 0.032928 0.000672