骑士队 vs. 勇士队 - 骑士队赢得系列赛的可能性包括“0,1,0,0,0,1,1”之类的组合 - 但系列赛在第 5 场比赛后结束

Cavs vs. Warriors - probability of Cavs winning the series includes combinations like "0,1,0,0,0,1,1" - but the series is over after game 5

DataCamp 中存在计算 NBA 系列赛获胜概率的问题。骑士队和勇士队正在打一场七场总冠军系列赛。第一个赢得四场比赛的人赢得系列赛。他们每人有 50-50 的机会赢得每场比赛。如果骑士输掉第一场比赛,他们赢得系列赛的概率是多少?

以下是 DataCamp 使用 Monte Carlo 模拟计算概率的方法:

B <- 10000
set.seed(1)
results<-replicate(B,{x<-sample(0:1,6,replace=T) # 0 when game is lost and 1 when won. 
sum(x)>=4})
mean(results)

这是他们使用简单代码计算概率的不同方法:

# Assign a variable 'n' as the number of remaining games.
n<-6

# Assign a variable `outcomes` as a vector of possible game outcomes: 0 indicates a loss and 1 a win for the Cavs.
outcomes<-c(0,1)

# Assign a variable `l` to a list of all possible outcomes in all remaining games. Use the `rep` function on `list(outcomes)` to create list of length `n`.
l<-rep(list(outcomes),n)

# Create a data frame named 'possibilities' that contains all combinations of possible outcomes for the remaining games.
possibilities<-expand.grid(l) # My comment: note how this produces 64 combinations.

# Create a vector named 'results' that indicates whether each row in the data frame 'possibilities' contains enough wins for the Cavs to win the series.
rowSums(possibilities)
results<-rowSums(possibilities)>=4

# Calculate the proportion of 'results' in which the Cavs win the series. 
mean(results)

Question/Problem:

他们赢得系列赛的概率大致相同 ~ 0.34。但是,概念和代码设计似乎存在缺陷。例如,代码(采样六次)允许如下组合:

G2   G3   G4   G5   G6   G7 rowSums
0    0    0    0    0    0      0   # Series over after G4 (Cavs lose). No need for game G5-G7.
0    0    0    0    1    0      1   # Series over after G4 (Cavs lose). Double counting!
0    0    0    0    0    1      1   # Double counting!

...
1    1    1    1    0    0      4   # No need for game G6 and G7.
1    1    1    1    0    1      5   # Double counting! This is the same as 1,1,1,1,0,0.
0    1    1    1    1    1      5   # No need for game G7.
1    1    1    1    1    1      6   # Series over after G5 (Cavs win). Double counting! 

> rowSums(possibilities)
 [1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6

如您所见,这些永远不可能。剩下的六场比赛赢了前四场之后,就不能再打比赛了。同样,在剩下的六场比赛中,前三场输掉之后,就不要再打比赛了。因此,这些组合不应包含在赢得系列赛的概率计算中。某些组合存在重复计算。

下面是我省略了一些在现实生活中不可能出现的组合。

outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities<-possibilities %>% mutate(rowsums=rowSums(possibilities)) %>% filter(rowsums<=4)

但是我无法省略其他不必要的组合。例如,我想删除这三个中的两个: (a) 1,0,0,0,0,0 (b) 1,0,0,0,0,1 (c) 1,0,0,0 ,1,1。这是因为三连败后就不会再打比赛了。而且他们基本上是重复计算。

条件太多,我无法单独筛选。必须有一种更有效和直观的方法来做到这一点。有人可以向我提供一些关于如何解决这整个烂摊子的提示吗?

这里有一个方法:

library(dplyr)
outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities %>% 
  mutate(rowsums=rowSums(cur_data()),
         anti_sum = rowSums(!cur_data())) %>% 
  filter(rowsums<=4, anti_sum <= 3)

我们使用 可以强制转换为逻辑值,其中 0 为假。请参阅 sum(!0) 作为简短示例。