在 R 中创建概率树

Question

我正在使用 R 编程语言。

假设我有以下设置：

有 5 个对象：A、B、C、D、E
每个对象被选中的概率是：0.2、0.3、0.1、0.3、0.1
您想从这些对象中挑选 5 个进行替换（例如 ABACD、DDBCA 等）

我想知道（精确解）：

这 5 个对象的所有组合
每个组合的概率

目前，我不知道该怎么做 - 我试图通过模拟“大量组合”来做到这一点，并希望我充分了解每种组合以推断出正确的概率：

library(dplyr)


results <- list()

 for (i in 1:100) {

iteration = i 
sample_i = sample(c("A", "B", "C", "D", "E"), size =5, replace = T, prob= c( 0.2, 0.3, 0.1, 0.3, 0.1))


my_data_i = data.frame(iteration, sample_i )

results[[i]] <- my_data_i

}

results_df <- data.frame(do.call(rbind.data.frame, results))

但这看起来是解决此问题的一种非常复杂的方法。最后，我会寻找这样的东西：

AAAAA：概率 = 0.03
AABDE：概率 = 0.06
DEECB：概率 = 0.07
等等

有人可以告诉我怎么做吗？

谢谢！

Answer 1

编辑：此解决方案适用于该示例，但对于具有更多有效数字的概率，很快就会耗尽内存。

创建与给定概率相对应的对象标签向量。
使用 expand.grid 生成所有可能的长度为 5 的组合。
结果中的唯一行数 == 可能的组合数。
每个组合占结果的比例==每个组合的概率

objs <- c(
  rep("A", 2),
  rep("B", 3),
  "C",
  rep("D", 3),
  "E"
)
combos <- expand.grid(
  p1 = objs,
  p2 = objs,
  p3 = objs,
  p4 = objs,
  p5 = objs
)
combos <- paste0(
  combos$p1, 
  combos$p2,
  combos$p3,
  combos$p4,
  combos$p5
)
n_combos <- length(combos)
combos_unique <- unique(combos)

# number of combinations
length(combos_unique)
# 3125

# probability of each combination
setNames(
  sapply(combos_unique, \(x) sum(combos == x) / n_combos),
  combos_unique
)

#   AAAAA   BAAAA   CAAAA   DAAAA   EAAAA   ABAAA   BBAAA   CBAAA   DBAAA   EBAAA 
# 0.00032 0.00048 0.00016 0.00048 0.00016 0.00048 0.00072 0.00024 0.00072 0.00024 
#   ACAAA   BCAAA   CCAAA   DCAAA   ECAAA   ADAAA   BDAAA   CDAAA   DDAAA   EDAAA 
# 0.00016 0.00024 0.00008 0.00024 0.00008 0.00048 0.00072 0.00024 0.00072 0.00024 
...

此解决方案的问题在于，行数将迅速增加，不仅对象更多或组合长度更长，而且有效数字的概率也会增加。例如，我只需要 3 个“B”来模拟 0.3 的概率，但需要 325 个才能模拟 0.325 的概率。

Answer 2

每个排列的总体概率是每个选定元素的概率的乘积。

library(RcppAlgos)

# Probabilities
probs <- setNames(c(0.2, 0.3, 0.1, 0.3, 0.1), LETTERS[1:5])

# Generate permutations
perms <- permuteGeneral(names(probs), repetition = TRUE)

# Collapse permutations
perm_res <- do.call(paste, c(asplit(perms, 2), sep = ""))

# Replace with probability values and coerce to numeric
perms[] <- probs[perms]
class(perms) <- "numeric"

# Calculate products
res <- data.frame(perm_res, prob = exp(rowSums(log(perms))))
head(res)

  perm_res    prob
1    AAAAA 0.00032
2    AAAAB 0.00048
3    AAAAC 0.00016
4    AAAAD 0.00048
5    AAAAE 0.00016
6    AAABA 0.00048

# Check total sums to 1
sum(res$prob)
[1] 1

在 R 中创建概率树

Creating Probability Trees in R

r

data-manipulation