骑士队 vs. 勇士队 - 骑士队赢得系列赛的可能性包括“0,1,0,0,0,1,1”之类的组合 - 但系列赛在第 5 场比赛后结束
Cavs vs. Warriors - probability of Cavs winning the series includes combinations like "0,1,0,0,0,1,1" - but the series is over after game 5
DataCamp 中存在计算 NBA 系列赛获胜概率的问题。骑士队和勇士队正在打一场七场总冠军系列赛。第一个赢得四场比赛的人赢得系列赛。他们每人有 50-50 的机会赢得每场比赛。如果骑士输掉第一场比赛,他们赢得系列赛的概率是多少?
以下是 DataCamp 使用 Monte Carlo 模拟计算概率的方法:
B <- 10000
set.seed(1)
results<-replicate(B,{x<-sample(0:1,6,replace=T) # 0 when game is lost and 1 when won.
sum(x)>=4})
mean(results)
这是他们使用简单代码计算概率的不同方法:
# Assign a variable 'n' as the number of remaining games.
n<-6
# Assign a variable `outcomes` as a vector of possible game outcomes: 0 indicates a loss and 1 a win for the Cavs.
outcomes<-c(0,1)
# Assign a variable `l` to a list of all possible outcomes in all remaining games. Use the `rep` function on `list(outcomes)` to create list of length `n`.
l<-rep(list(outcomes),n)
# Create a data frame named 'possibilities' that contains all combinations of possible outcomes for the remaining games.
possibilities<-expand.grid(l) # My comment: note how this produces 64 combinations.
# Create a vector named 'results' that indicates whether each row in the data frame 'possibilities' contains enough wins for the Cavs to win the series.
rowSums(possibilities)
results<-rowSums(possibilities)>=4
# Calculate the proportion of 'results' in which the Cavs win the series.
mean(results)
Question/Problem:
他们赢得系列赛的概率大致相同 ~ 0.34。但是,概念和代码设计似乎存在缺陷。例如,代码(采样六次)允许如下组合:
G2 G3 G4 G5 G6 G7 rowSums
0 0 0 0 0 0 0 # Series over after G4 (Cavs lose). No need for game G5-G7.
0 0 0 0 1 0 1 # Series over after G4 (Cavs lose). Double counting!
0 0 0 0 0 1 1 # Double counting!
...
1 1 1 1 0 0 4 # No need for game G6 and G7.
1 1 1 1 0 1 5 # Double counting! This is the same as 1,1,1,1,0,0.
0 1 1 1 1 1 5 # No need for game G7.
1 1 1 1 1 1 6 # Series over after G5 (Cavs win). Double counting!
> rowSums(possibilities)
[1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
如您所见,这些永远不可能。剩下的六场比赛赢了前四场之后,就不能再打比赛了。同样,在剩下的六场比赛中,前三场输掉之后,就不要再打比赛了。因此,这些组合不应包含在赢得系列赛的概率计算中。某些组合存在重复计算。
下面是我省略了一些在现实生活中不可能出现的组合。
outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities<-possibilities %>% mutate(rowsums=rowSums(possibilities)) %>% filter(rowsums<=4)
但是我无法省略其他不必要的组合。例如,我想删除这三个中的两个: (a) 1,0,0,0,0,0 (b) 1,0,0,0,0,1 (c) 1,0,0,0 ,1,1。这是因为三连败后就不会再打比赛了。而且他们基本上是重复计算。
条件太多,我无法单独筛选。必须有一种更有效和直观的方法来做到这一点。有人可以向我提供一些关于如何解决这整个烂摊子的提示吗?
这里有一个方法:
library(dplyr)
outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities %>%
mutate(rowsums=rowSums(cur_data()),
anti_sum = rowSums(!cur_data())) %>%
filter(rowsums<=4, anti_sum <= 3)
我们使用 r 可以强制转换为逻辑值,其中 0 为假。请参阅 sum(!0)
作为简短示例。
DataCamp 中存在计算 NBA 系列赛获胜概率的问题。骑士队和勇士队正在打一场七场总冠军系列赛。第一个赢得四场比赛的人赢得系列赛。他们每人有 50-50 的机会赢得每场比赛。如果骑士输掉第一场比赛,他们赢得系列赛的概率是多少?
以下是 DataCamp 使用 Monte Carlo 模拟计算概率的方法:
B <- 10000
set.seed(1)
results<-replicate(B,{x<-sample(0:1,6,replace=T) # 0 when game is lost and 1 when won.
sum(x)>=4})
mean(results)
这是他们使用简单代码计算概率的不同方法:
# Assign a variable 'n' as the number of remaining games.
n<-6
# Assign a variable `outcomes` as a vector of possible game outcomes: 0 indicates a loss and 1 a win for the Cavs.
outcomes<-c(0,1)
# Assign a variable `l` to a list of all possible outcomes in all remaining games. Use the `rep` function on `list(outcomes)` to create list of length `n`.
l<-rep(list(outcomes),n)
# Create a data frame named 'possibilities' that contains all combinations of possible outcomes for the remaining games.
possibilities<-expand.grid(l) # My comment: note how this produces 64 combinations.
# Create a vector named 'results' that indicates whether each row in the data frame 'possibilities' contains enough wins for the Cavs to win the series.
rowSums(possibilities)
results<-rowSums(possibilities)>=4
# Calculate the proportion of 'results' in which the Cavs win the series.
mean(results)
Question/Problem:
他们赢得系列赛的概率大致相同 ~ 0.34。但是,概念和代码设计似乎存在缺陷。例如,代码(采样六次)允许如下组合:
G2 G3 G4 G5 G6 G7 rowSums
0 0 0 0 0 0 0 # Series over after G4 (Cavs lose). No need for game G5-G7.
0 0 0 0 1 0 1 # Series over after G4 (Cavs lose). Double counting!
0 0 0 0 0 1 1 # Double counting!
...
1 1 1 1 0 0 4 # No need for game G6 and G7.
1 1 1 1 0 1 5 # Double counting! This is the same as 1,1,1,1,0,0.
0 1 1 1 1 1 5 # No need for game G7.
1 1 1 1 1 1 6 # Series over after G5 (Cavs win). Double counting!
> rowSums(possibilities)
[1] 0 1 1 2 1 2 2 3 1 2 2 3 2 3 3 4 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 1 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5 2 3 3 4 3 4 4 5 3 4 4 5 4 5 5 6
如您所见,这些永远不可能。剩下的六场比赛赢了前四场之后,就不能再打比赛了。同样,在剩下的六场比赛中,前三场输掉之后,就不要再打比赛了。因此,这些组合不应包含在赢得系列赛的概率计算中。某些组合存在重复计算。
下面是我省略了一些在现实生活中不可能出现的组合。
outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities<-possibilities %>% mutate(rowsums=rowSums(possibilities)) %>% filter(rowsums<=4)
但是我无法省略其他不必要的组合。例如,我想删除这三个中的两个: (a) 1,0,0,0,0,0 (b) 1,0,0,0,0,1 (c) 1,0,0,0 ,1,1。这是因为三连败后就不会再打比赛了。而且他们基本上是重复计算。
条件太多,我无法单独筛选。必须有一种更有效和直观的方法来做到这一点。有人可以向我提供一些关于如何解决这整个烂摊子的提示吗?
这里有一个方法:
library(dplyr)
outcomes<-c(0,1)
l<-rep(list(outcomes),6)
possibilities<-expand.grid(l)
possibilities %>%
mutate(rowsums=rowSums(cur_data()),
anti_sum = rowSums(!cur_data())) %>%
filter(rowsums<=4, anti_sum <= 3)
我们使用 r 可以强制转换为逻辑值,其中 0 为假。请参阅 sum(!0)
作为简短示例。