基于 R 中规则集的模拟
Simulation based on set of rules in R
我想要 运行 一个随机选择行并根据一组规则将行的总值相加的 sim。我是模拟新手,所以不知道从哪里开始。
规则:每个 sim 总共选择 9 行。每个 9 人的 sim 必须包含以下数量的“位置”:
QB: 1
RB: 2
WR: 3
TE: 1
K: 1
夏令时:1
我希望每个 sim 将组的值(WAR 列)和输出显示每个玩家的百分比相加,比如具有最高 WAR 的组的前 10%。希望这是有道理的。这里的最终目标是确定哪些玩家最有可能成功。
这里以每个位置的十名顶级球员为例。
输入
structure(list(player = c("Justin Tucker", "Harrison Butker",
"Wil Lutz", "Greg Zuerlein", "Matt Gay", "Brandon McManus", "Jake Elliott",
"Robbie Gould", "Stephen Hauschka", "Dan Bailey", "Patrick Mahomes",
"Lamar Jackson", "Dak Prescott", "Russell Wilson", "Kyler Murray",
"Deshaun Watson", "Matt Ryan", "Josh Allen", "Tom Brady", "Carson Wentz",
"Christian McCaffrey", "Saquon Barkley", "Ezekiel Elliott", "Alvin Kamara",
"Dalvin Cook", "Clyde Edwards-Helaire", "Derrick Henry", "Miles Sanders",
"Joe Mixon", "Josh Jacobs", "Travis Kelce", "George Kittle",
"Mark Andrews", "Zach Ertz", "Darren Waller", "Evan Engram",
"Hayden Hurst", "Tyler Higbee", "Hunter Henry", "Mike Gesicki",
"Michael Thomas", "Davante Adams", "Julio Jones", "Tyreek Hill",
"DeAndre Hopkins", "Chris Godwin", "Kenny Golladay", "Allen Robinson",
"DJ Moore", "Odell Beckham"), adp = c(3, 3, 2, 2, 1, 1, 1, 1,
1, 1, 26, 23, 12, 11, 10, 9, 5, 4, 4, 4, 66, 57, 53, 50, 45,
43, 41, 40, 40, 39, 29, 26, 18, 15, 10, 8, 7, 6, 4, 4, 48, 40,
38, 37, 36, 34, 29, 27, 27, 27), WAR = c(0.27, 0.27, 0.1, 0.23,
0.09, 0.19, -0.83, -0.3, -0.1, -0.62, 2.26, 1.41, 0.91, 1.7,
2.28, 1.74, 0.28, 2.29, 1.12, 0.06, 1.02, -0.05, 1.36, 3.57,
3.48, 1.04, 2.91, 1.13, 0.69, 1.49, 2.79, 0.71, 0.85, -0.22,
1.67, 0.07, 0.26, 0.06, 0.35, 0.64, -0.04, 2.74, 0.63, 2.35,
1.49, 0.49, 0.33, 1.17, 0.61, 0.28), position = c("K", "K", "K",
"K", "K", "K", "K", "K", "K", "K", "QB", "QB", "QB", "QB", "QB",
"QB", "QB", "QB", "QB", "QB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "TE", "TE", "TE", "TE", "TE", "TE", "TE",
"TE", "TE", "TE", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR")), row.names = c(NA, -50L), groups = structure(list(
position = c("K", "QB", "RB", "TE", "WR"), .rows = structure(list(
1:10, 11:20, 21:30, 31:40, 41:50), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
一个想法是您可以使用查找 table 来设置每组样本的数量,然后创建一个函数来 运行 通过对每个样本 n_samples
进行“模拟”团体。不完全确定你对 WAR
的总和有什么要求,但是一旦你有了像分组总和这样的模拟操作就应该很简单了。
请注意,您的样本数据中没有“DST”位置,因此每次模拟只会出现 8 个。
library(tidyverse)
# lookup table
df_sample <- data.frame(position = c("K", "QB", "RB", "TE", "WR", "DST"),
n_samples = c(1, 1, 2, 1, 3, 1))
df_nest <- df %>%
left_join(df_sample) %>%
group_by(position, n_samples) %>%
nest
run_sim <- function(nested_df = df_nest){
nested_df %>%
mutate(sim = map2(data, n_samples, sample_n)) %>%
ungroup() %>%
select(-data, -n_samples) %>%
unnest(sim)
}
map_dfr(1:10, ~run_sim(df_nest), .id = 'sim')
#----
# A tibble: 80 x 5
sim position player adp WAR
<chr> <chr> <chr> <dbl> <dbl>
1 1 K Dan Bailey 1 -0.62
2 1 QB Patrick Mahomes 26 2.26
3 1 RB Miles Sanders 40 1.13
4 1 RB Joe Mixon 40 0.69
5 1 TE Evan Engram 8 0.07
6 1 WR Julio Jones 38 0.63
7 1 WR Michael Thomas 48 -0.04
8 1 WR DeAndre Hopkins 36 1.49
9 2 K Stephen Hauschka 1 -0.1
10 2 QB Russell Wilson 11 1.7
# ... with 70 more rows
我想要 运行 一个随机选择行并根据一组规则将行的总值相加的 sim。我是模拟新手,所以不知道从哪里开始。
规则:每个 sim 总共选择 9 行。每个 9 人的 sim 必须包含以下数量的“位置”:
QB: 1
RB: 2
WR: 3
TE: 1
K: 1
夏令时:1
我希望每个 sim 将组的值(WAR 列)和输出显示每个玩家的百分比相加,比如具有最高 WAR 的组的前 10%。希望这是有道理的。这里的最终目标是确定哪些玩家最有可能成功。
这里以每个位置的十名顶级球员为例。
输入
structure(list(player = c("Justin Tucker", "Harrison Butker",
"Wil Lutz", "Greg Zuerlein", "Matt Gay", "Brandon McManus", "Jake Elliott",
"Robbie Gould", "Stephen Hauschka", "Dan Bailey", "Patrick Mahomes",
"Lamar Jackson", "Dak Prescott", "Russell Wilson", "Kyler Murray",
"Deshaun Watson", "Matt Ryan", "Josh Allen", "Tom Brady", "Carson Wentz",
"Christian McCaffrey", "Saquon Barkley", "Ezekiel Elliott", "Alvin Kamara",
"Dalvin Cook", "Clyde Edwards-Helaire", "Derrick Henry", "Miles Sanders",
"Joe Mixon", "Josh Jacobs", "Travis Kelce", "George Kittle",
"Mark Andrews", "Zach Ertz", "Darren Waller", "Evan Engram",
"Hayden Hurst", "Tyler Higbee", "Hunter Henry", "Mike Gesicki",
"Michael Thomas", "Davante Adams", "Julio Jones", "Tyreek Hill",
"DeAndre Hopkins", "Chris Godwin", "Kenny Golladay", "Allen Robinson",
"DJ Moore", "Odell Beckham"), adp = c(3, 3, 2, 2, 1, 1, 1, 1,
1, 1, 26, 23, 12, 11, 10, 9, 5, 4, 4, 4, 66, 57, 53, 50, 45,
43, 41, 40, 40, 39, 29, 26, 18, 15, 10, 8, 7, 6, 4, 4, 48, 40,
38, 37, 36, 34, 29, 27, 27, 27), WAR = c(0.27, 0.27, 0.1, 0.23,
0.09, 0.19, -0.83, -0.3, -0.1, -0.62, 2.26, 1.41, 0.91, 1.7,
2.28, 1.74, 0.28, 2.29, 1.12, 0.06, 1.02, -0.05, 1.36, 3.57,
3.48, 1.04, 2.91, 1.13, 0.69, 1.49, 2.79, 0.71, 0.85, -0.22,
1.67, 0.07, 0.26, 0.06, 0.35, 0.64, -0.04, 2.74, 0.63, 2.35,
1.49, 0.49, 0.33, 1.17, 0.61, 0.28), position = c("K", "K", "K",
"K", "K", "K", "K", "K", "K", "K", "QB", "QB", "QB", "QB", "QB",
"QB", "QB", "QB", "QB", "QB", "RB", "RB", "RB", "RB", "RB", "RB",
"RB", "RB", "RB", "RB", "TE", "TE", "TE", "TE", "TE", "TE", "TE",
"TE", "TE", "TE", "WR", "WR", "WR", "WR", "WR", "WR", "WR", "WR",
"WR", "WR")), row.names = c(NA, -50L), groups = structure(list(
position = c("K", "QB", "RB", "TE", "WR"), .rows = structure(list(
1:10, 11:20, 21:30, 31:40, 41:50), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -5L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
一个想法是您可以使用查找 table 来设置每组样本的数量,然后创建一个函数来 运行 通过对每个样本 n_samples
进行“模拟”团体。不完全确定你对 WAR
的总和有什么要求,但是一旦你有了像分组总和这样的模拟操作就应该很简单了。
请注意,您的样本数据中没有“DST”位置,因此每次模拟只会出现 8 个。
library(tidyverse)
# lookup table
df_sample <- data.frame(position = c("K", "QB", "RB", "TE", "WR", "DST"),
n_samples = c(1, 1, 2, 1, 3, 1))
df_nest <- df %>%
left_join(df_sample) %>%
group_by(position, n_samples) %>%
nest
run_sim <- function(nested_df = df_nest){
nested_df %>%
mutate(sim = map2(data, n_samples, sample_n)) %>%
ungroup() %>%
select(-data, -n_samples) %>%
unnest(sim)
}
map_dfr(1:10, ~run_sim(df_nest), .id = 'sim')
#----
# A tibble: 80 x 5
sim position player adp WAR
<chr> <chr> <chr> <dbl> <dbl>
1 1 K Dan Bailey 1 -0.62
2 1 QB Patrick Mahomes 26 2.26
3 1 RB Miles Sanders 40 1.13
4 1 RB Joe Mixon 40 0.69
5 1 TE Evan Engram 8 0.07
6 1 WR Julio Jones 38 0.63
7 1 WR Michael Thomas 48 -0.04
8 1 WR DeAndre Hopkins 36 1.49
9 2 K Stephen Hauschka 1 -0.1
10 2 QB Russell Wilson 11 1.7
# ... with 70 more rows