R 编程:从 data.frame 中选择每个因素组合
R programming: choose every combination of factors from a data.frame
我有一个 data.table 看起来像:
Position Name Salary
1: WR Julio Jones 9300
2: WR Odell Beckham Jr. 9200
3: WR Demaryius Thomas 9100
4: WR Dez Bryant 8700
5: QB Aaron Rodgers 8600
---
904: TE Jean Sifrin 2500
905: TE Khari Lee 2500
906: TE John Peters 2500
907: DST Bears 2400
908: DST Raiders 2300
我想得到所有可能的球队组合,由 1 个 QB、3 个 WR、2 个 RB 和 1 个 TE 组成,其中没有重复的球员。我不知道如何在 R 中选择组合,任何方向都很好。
我从这里得到了包含数据的 csv 文件:https://www.draftkings.com/contest/draftteam/7962690
如果你下载它并想玩数据,这是我目前的代码...
library(data.table)
library(dplyr)
dk <- read.csv(".../Downloads/DKSalaries.csv")
dk.dt <- as.data.table(dk)
dk.dt <- select(dk.dt, Position, Name, Salary, AvgPointsPerGame)
最佳
使用较小的数据样本,这里有一些代码(不太有效!)应该提取这些分组。
## Create a smaller sample to work with, 4 of each position
dk <- read.csv("DKsalaries.csv")
dk <- lapply(split(dk, dk$Position), function(x) x[sample(4), ])
dk <- dk[-1] # remove the other position
dk <- dk[c("QB", "WR", "RB", "TE")] # reordering, for no reason really
## Expected number of combinations
## #QBs * choose(#WR, 3) * choose(#RB, 2) * #TE
4*choose(4,3)*choose(4,2)*4
# 384
## Get indices of combos within each group
rows <- list(t(1:4), combn(4,3), combn(4,2), t(1:4)) # these are possible combinations of each position
dims <- sapply(rows, NCOL)
inds <- expand.grid(mapply(`:`, 1, dims)) # indicies of combinations in 'rows'
dim(inds)
# [1] 384 4
## Function to extract a group
extract <- function(ind) {
g <- inds[ind,]
do.call(rbind, lapply(1:4, function(i) dk[[i]][rows[[i]][,g[[i]]], ]))
}
## So, one combination would be
extract(1)
# Position Name Salary GameInfo AvgPointsPerGame
# QB.5 QB Aaron Rodgers 8600 GB@Chi 01:00PM ET 23.428
# WR.1 WR Julio Jones 9300 Phi@Atl 07:10PM ET 21.293
# WR.3 WR Demaryius Thomas 9100 Bal@Den 04:25PM ET 22.812
# WR.2 WR Odell Beckham Jr. 9200 NYG@Dal 08:30PM ET 26.417
# RB.13 RB Jamaal Charles 7900 KC@Hou 01:00PM ET 17.093
# RB.20 RB Arian Foster 7600 KC@Hou 01:00PM ET 22.808
# TE.191 TE Travis Kelce 4800 KC@Hou 01:00PM ET 11.825
然后,要获得列表中的所有组合,您可以这样做
res <- lapply(1:384, extract)
我有一个 data.table 看起来像:
Position Name Salary
1: WR Julio Jones 9300
2: WR Odell Beckham Jr. 9200
3: WR Demaryius Thomas 9100
4: WR Dez Bryant 8700
5: QB Aaron Rodgers 8600
---
904: TE Jean Sifrin 2500
905: TE Khari Lee 2500
906: TE John Peters 2500
907: DST Bears 2400
908: DST Raiders 2300
我想得到所有可能的球队组合,由 1 个 QB、3 个 WR、2 个 RB 和 1 个 TE 组成,其中没有重复的球员。我不知道如何在 R 中选择组合,任何方向都很好。
我从这里得到了包含数据的 csv 文件:https://www.draftkings.com/contest/draftteam/7962690
如果你下载它并想玩数据,这是我目前的代码...
library(data.table)
library(dplyr)
dk <- read.csv(".../Downloads/DKSalaries.csv")
dk.dt <- as.data.table(dk)
dk.dt <- select(dk.dt, Position, Name, Salary, AvgPointsPerGame)
最佳
使用较小的数据样本,这里有一些代码(不太有效!)应该提取这些分组。
## Create a smaller sample to work with, 4 of each position
dk <- read.csv("DKsalaries.csv")
dk <- lapply(split(dk, dk$Position), function(x) x[sample(4), ])
dk <- dk[-1] # remove the other position
dk <- dk[c("QB", "WR", "RB", "TE")] # reordering, for no reason really
## Expected number of combinations
## #QBs * choose(#WR, 3) * choose(#RB, 2) * #TE
4*choose(4,3)*choose(4,2)*4
# 384
## Get indices of combos within each group
rows <- list(t(1:4), combn(4,3), combn(4,2), t(1:4)) # these are possible combinations of each position
dims <- sapply(rows, NCOL)
inds <- expand.grid(mapply(`:`, 1, dims)) # indicies of combinations in 'rows'
dim(inds)
# [1] 384 4
## Function to extract a group
extract <- function(ind) {
g <- inds[ind,]
do.call(rbind, lapply(1:4, function(i) dk[[i]][rows[[i]][,g[[i]]], ]))
}
## So, one combination would be
extract(1)
# Position Name Salary GameInfo AvgPointsPerGame
# QB.5 QB Aaron Rodgers 8600 GB@Chi 01:00PM ET 23.428
# WR.1 WR Julio Jones 9300 Phi@Atl 07:10PM ET 21.293
# WR.3 WR Demaryius Thomas 9100 Bal@Den 04:25PM ET 22.812
# WR.2 WR Odell Beckham Jr. 9200 NYG@Dal 08:30PM ET 26.417
# RB.13 RB Jamaal Charles 7900 KC@Hou 01:00PM ET 17.093
# RB.20 RB Arian Foster 7600 KC@Hou 01:00PM ET 22.808
# TE.191 TE Travis Kelce 4800 KC@Hou 01:00PM ET 11.825
然后,要获得列表中的所有组合,您可以这样做
res <- lapply(1:384, extract)