我想根据同一数据框中其他列的条件从 R 数据框中的列生成 8 种名称组合
I want to generate 8 combinations of names from a column in an R data frame based on conditions from other columns in the same data frame
我有一个数据框,其中包含来自 4 个不同球队的 20 名球员(每队 5 名球员),每个球员都从幻想选秀中获得了薪水。我希望能够创建薪水等于或小于 10000 且总分大于 x 的 8 名球员的所有组合,但不包括包含来自同一球队的 4 名或更多球员的任何组合。
这是我的数据框的样子:
Team Player K D A LH Points Salary PPS
4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692
2 ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913
1 ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560
3 ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061
5 ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861
11 CL Swiftending 6.0 5.8 7.8 360.5 22.285 1606 1.3876
13 CL Pajkatt 13.3 7.5 9.3 326.8 37.248 1489 2.5015
15 CL SexyBamboe 6.3 8.5 9.3 168.0 20.660 1256 1.6449
14 CL EGM 2.8 6.0 13.5 78.8 21.988 989 2.2233
12 CL Saksa 2.5 6.5 10.5 59.8 15.898 967 1.6441
51 DBEARS Ace 7.0 3.4 6.9 195.6 23.596 1578 1.4953
31 DBEARS HesteJoe 5.4 5.4 6.1 176.7 16.927 1512 1.1195
61 DBEARS Miggel 2.8 6.8 11.0 141.8 17.818 1212 1.4701
21 DBEARS Noia 3.0 6.0 8.0 36.1 13.161 970 1.3568
41 DBEARS Ryze 2.7 4.7 6.7 74.6 12.166 937 1.2984
8 GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935
9 GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305
10 GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071
7 GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251
6 GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930
我遵循此 post 中描述的一般概念:I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less
调整代码以满足我的需要。这是我目前所拥有的:
## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn, 8))
## convert the names to a string,
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
使用上面的代码,我能够生成所有可能的 8 名球员的阵容,然后根据各种标准(总薪水和积分数)对其进行子集化,但是在排除有同一队超过 3 名球员。
我认为阵容需要从 newdf 中排除,但我真的不知道从哪里开始。
这里是输出结果:
structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL",
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS",
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L,
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L,
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer",
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224",
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending",
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3,
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3,
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6,
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8,
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1,
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168,
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8,
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282,
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818,
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622,
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512,
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692,
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233,
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305,
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D",
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4",
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31",
"61", "21", "41", "8", "9", "10", "7", "6"))
这是一种方法:
splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]
首先用逗号分隔Player
列。然后将玩家名称与 Fantasy Players
玩家名称列匹配。有了这些 indices
,我们可以完成主要工作 any(table(FantasyPlayers$Team[x]) > 3)
。这是对超过三支球队的检查,这将表明同一支球队有 3 名或更多球员。
我认为最好以长格式构建它:
组建团队
library(data.table)
setDT(FantasyPlayers)
xx <- combn(as.character(FantasyPlayers$Player), 8)
mxx <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))
head(mxx,10)
# jersey_no team_no Player
# 1: 1 1 ExoticDeer
# 2: 2 1 Supreme
# 3: 3 1 sasu
# 4: 4 1 eL lisasH 2
# 5: 5 1 Nisha
# 6: 6 1 Swiftending
# 7: 7 1 Pajkatt
# 8: 8 1 SexyBamboe
# 9: 1 2 ExoticDeer
# 10: 2 2 Supreme
8 人一组的玩家共享一个 team_no
并按他们的 jersey_no
编入索引。查看 ?melt.array
以了解其工作原理。 setDT
只是将生成的 data.frame 转换为 data.table 以便于合并。
合并恢复 Player
属性
FantasyTeams <- FantasyPlayers[mxx, on="Player"]
# Team Player K D A LH Points Salary PPS jersey_no team_no
# 1: ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692 1 1
# 2: ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913 2 1
# 3: ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560 3 1
# 4: ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061 4 1
# 5: ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861 5 1
# ---
# 1007756: GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935 4 125970
# 1007757: GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305 5 125970
# 1007758: GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071 6 125970
# 1007759: GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251 7 125970
# 1007760: GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930 8 125970
默认情况下,只打印 data.table 的第一行和最后几行。要检查整个事情,请尝试 ?View
或查看 ?print.data.table
.
的参数
筛选出具有所选特征的一组团队
筛选出来自同一 Team
的不超过三名玩家的 team_no
...
my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no
V1
是分配给构造变量max(table(Team)) <= 3
的默认名称。这不是快如闪电,但既然你已经排除了一些团队,后面的子集步骤应该会更快:
my_new_teams <-
FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no
要节省几次击键和微秒,请将 (V1)
替换为 V1==TRUE
。这是惯用的方式。
正在从一组团队中恢复花名册
要获得与每个团队关联的花名册,join/merge 和 mxx
mxx[.(team_no = my_new_teams), on="team_no"]
如果您希望将球员列在一行中,如 OP 中所示:
mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]
如果您想要每个团队的汇总统计数据,则需要加入 FantasyTeams
:
FantasyTeams[.(team_no = my_new_teams), .(
roster = toString(Player),
tot_salary = sum(Salary),
tot_points = sum(Points)
), on="team_no", by=.EACHI]
# team_no roster tot_salary tot_points
# 1: 3716 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze 9913 149.018
# 2: 3720 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT 9983 146.168
# 3: 3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224 9930 144.417
# 4: 3725 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT 9950 145.173
# 5: 3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224 9897 143.422
# ---
# 40202: 125663 EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224 8638 117.032
# 40203: 125664 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT 8925 119.970
# 40204: 125665 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224 8872 118.219
# 40205: 125666 EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224 8613 117.317
# 40206: 125667 EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224 8302 108.130
要了解 by=.EACHI
的作用,需要一些背景知识。这里的合并语法是 DT[i, j, on=cols, by=.EACHI]
。
- 如果
j
和 by
被遗漏,它只是进行合并,就像 FantasyTeams
的构造一样。
- 如果省略
by
,但包含 j
,则合并后计算 j
。
- 如果
by=.EACHI
,则对i
中的每个值单独计算j
。
我有一个数据框,其中包含来自 4 个不同球队的 20 名球员(每队 5 名球员),每个球员都从幻想选秀中获得了薪水。我希望能够创建薪水等于或小于 10000 且总分大于 x 的 8 名球员的所有组合,但不包括包含来自同一球队的 4 名或更多球员的任何组合。
这是我的数据框的样子:
Team Player K D A LH Points Salary PPS
4 ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692
2 ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913
1 ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560
3 ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061
5 ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861
11 CL Swiftending 6.0 5.8 7.8 360.5 22.285 1606 1.3876
13 CL Pajkatt 13.3 7.5 9.3 326.8 37.248 1489 2.5015
15 CL SexyBamboe 6.3 8.5 9.3 168.0 20.660 1256 1.6449
14 CL EGM 2.8 6.0 13.5 78.8 21.988 989 2.2233
12 CL Saksa 2.5 6.5 10.5 59.8 15.898 967 1.6441
51 DBEARS Ace 7.0 3.4 6.9 195.6 23.596 1578 1.4953
31 DBEARS HesteJoe 5.4 5.4 6.1 176.7 16.927 1512 1.1195
61 DBEARS Miggel 2.8 6.8 11.0 141.8 17.818 1212 1.4701
21 DBEARS Noia 3.0 6.0 8.0 36.1 13.161 970 1.3568
41 DBEARS Ryze 2.7 4.7 6.7 74.6 12.166 937 1.2984
8 GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935
9 GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305
10 GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071
7 GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251
6 GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930
我遵循此 post 中描述的一般概念:I want to generate combinations of 5 names from a column in an R data frame, whose values in a different column add up to a certain number or less
调整代码以满足我的需要。这是我目前所拥有的:
## make a list of all combinations of 8 of Player, Points and Salary
xx <- with(FantasyPlayers, lapply(list(as.character(Player), Points, Salary), combn, 8))
## convert the names to a string,
## find the column sums of the others,
## set the names
yy <- setNames(
lapply(xx, function(x) {
if(typeof(x) == "character") apply(x, 2, toString) else colSums(x)
}),
names(FantasyPlayers)[c(2, 7, 8)]
)
## coerce to data.frame
newdf <- as.data.frame(yy)
使用上面的代码,我能够生成所有可能的 8 名球员的阵容,然后根据各种标准(总薪水和积分数)对其进行子集化,但是在排除有同一队超过 3 名球员。
我认为阵容需要从 newdf 中排除,但我真的不知道从哪里开始。
这里是输出结果:
structure(list(Team = c("ATN", "ATN", "ATN", "ATN", "ATN", "CL",
"CL", "CL", "CL", "CL", "DBEARS", "DBEARS", "DBEARS", "DBEARS",
"DBEARS", "GB", "GB", "GB", "GB", "GB"), Player = structure(c(2L,
5L, 4L, 1L, 3L, 15L, 12L, 14L, 11L, 13L, 16L, 18L, 19L, 20L,
21L, 6L, 7L, 10L, 8L, 9L), .Label = c("eL lisasH 2", "ExoticDeer",
"Nisha", "sasu", "Supreme", "Keyser Soze", "Madara", "MNT", "SKANKS224",
"SkyLark", "EGM", "Pajkatt", "Saksa", "SexyBamboe", "Swiftending",
"Ace", "DruidzOzoneShoc", "HesteJoe", "Miggel", "Noia", "Ryze"
), class = "factor"), K = c(6.1, 6.8, 3.6, 2.6, 2.7, 6, 13.3,
6.3, 2.8, 2.5, 7, 5.4, 2.8, 3, 2.7, 6, 5.4, 1.8, 2.3, 1.4), D = c(3.3,
5.3, 6.4, 6.1, 5.6, 5.8, 7.5, 8.5, 6, 6.5, 3.4, 5.4, 6.8, 6,
4.7, 5, 5.3, 5.3, 5.9, 7.6), A = c(6.4, 7.1, 11, 7.9, 7.5, 7.8,
9.3, 9.3, 13.5, 10.5, 6.9, 6.1, 11, 8, 6.7, 5.6, 6.6, 7, 6.1,
7.4), LH = c(306.9, 229.4, 95.7, 29.7, 48.2, 360.5, 326.8, 168,
78.8, 59.8, 195.6, 176.7, 141.8, 36.1, 74.6, 316, 334.5, 71.8,
85.6, 52.5), Points = c(22.209, 21.954, 19.357, 12.037, 12.282,
22.285, 37.248, 20.66, 21.988, 15.898, 23.596, 16.927, 17.818,
13.161, 12.166, 19.12, 19.405, 10.218, 9.316, 7.565), Salary = c(1622,
1578, 1244, 998, 955, 1606, 1489, 1256, 989, 967, 1578, 1512,
1212, 970, 937, 1602, 1577, 1266, 1007, 954), PPS = c(1.3692,
1.3913, 1.556, 1.2061, 1.2861, 1.3876, 2.5015, 1.6449, 2.2233,
1.6441, 1.4953, 1.1195, 1.4701, 1.3568, 1.2984, 1.1935, 1.2305,
0.8071, 0.9251, 0.793)), .Names = c("Team", "Player", "K", "D",
"A", "LH", "Points", "Salary", "PPS"), class = "data.frame", row.names = c("4",
"2", "1", "3", "5", "11", "13", "15", "14", "12", "51", "31",
"61", "21", "41", "8", "9", "10", "7", "6"))
这是一种方法:
splt.names <- strsplit(as.character(newdf$Player), ", ")
indices <- lapply(splt.names, function(x) match(x, FantasyPlayers$Player))
exclude <- lapply(indices, function(x) any(table(FantasyPlayers$Team[x]) > 3))
newdf2 <- newdf[!unlist(exclude), ]
首先用逗号分隔Player
列。然后将玩家名称与 Fantasy Players
玩家名称列匹配。有了这些 indices
,我们可以完成主要工作 any(table(FantasyPlayers$Team[x]) > 3)
。这是对超过三支球队的检查,这将表明同一支球队有 3 名或更多球员。
我认为最好以长格式构建它:
组建团队
library(data.table)
setDT(FantasyPlayers)
xx <- combn(as.character(FantasyPlayers$Player), 8)
mxx <- setDT(melt(xx, varnames=c("jersey_no", "team_no"), value.name="Player"))
head(mxx,10)
# jersey_no team_no Player
# 1: 1 1 ExoticDeer
# 2: 2 1 Supreme
# 3: 3 1 sasu
# 4: 4 1 eL lisasH 2
# 5: 5 1 Nisha
# 6: 6 1 Swiftending
# 7: 7 1 Pajkatt
# 8: 8 1 SexyBamboe
# 9: 1 2 ExoticDeer
# 10: 2 2 Supreme
8 人一组的玩家共享一个 team_no
并按他们的 jersey_no
编入索引。查看 ?melt.array
以了解其工作原理。 setDT
只是将生成的 data.frame 转换为 data.table 以便于合并。
合并恢复 Player
属性
FantasyTeams <- FantasyPlayers[mxx, on="Player"]
# Team Player K D A LH Points Salary PPS jersey_no team_no
# 1: ATN ExoticDeer 6.1 3.3 6.4 306.9 22.209 1622 1.3692 1 1
# 2: ATN Supreme 6.8 5.3 7.1 229.4 21.954 1578 1.3913 2 1
# 3: ATN sasu 3.6 6.4 11.0 95.7 19.357 1244 1.5560 3 1
# 4: ATN eL lisasH 2 2.6 6.1 7.9 29.7 12.037 998 1.2061 4 1
# 5: ATN Nisha 2.7 5.6 7.5 48.2 12.282 955 1.2861 5 1
# ---
# 1007756: GB Keyser Soze 6.0 5.0 5.6 316.0 19.120 1602 1.1935 4 125970
# 1007757: GB Madara 5.4 5.3 6.6 334.5 19.405 1577 1.2305 5 125970
# 1007758: GB SkyLark 1.8 5.3 7.0 71.8 10.218 1266 0.8071 6 125970
# 1007759: GB MNT 2.3 5.9 6.1 85.6 9.316 1007 0.9251 7 125970
# 1007760: GB SKANKS224 1.4 7.6 7.4 52.5 7.565 954 0.7930 8 125970
默认情况下,只打印 data.table 的第一行和最后几行。要检查整个事情,请尝试 ?View
或查看 ?print.data.table
.
筛选出具有所选特征的一组团队
筛选出来自同一 Team
的不超过三名玩家的 team_no
...
my_teams <- FantasyTeams[, max(table(Team)) <= 3, by=team_no][V1==TRUE]$team_no
V1
是分配给构造变量max(table(Team)) <= 3
的默认名称。这不是快如闪电,但既然你已经排除了一些团队,后面的子集步骤应该会更快:
my_new_teams <-
FantasyTeams[team_no %in% my_teams, sum(Salary) < 10000, by=team_no][V1==TRUE]$team_no
要节省几次击键和微秒,请将 (V1)
替换为 V1==TRUE
。这是惯用的方式。
正在从一组团队中恢复花名册
要获得与每个团队关联的花名册,join/merge 和 mxx
mxx[.(team_no = my_new_teams), on="team_no"]
如果您希望将球员列在一行中,如 OP 中所示:
mxx[.(team_no = my_new_teams), .(roster = toString(Player)), on="team_no", by=.EACHI]
如果您想要每个团队的汇总统计数据,则需要加入 FantasyTeams
:
FantasyTeams[.(team_no = my_new_teams), .(
roster = toString(Player),
tot_salary = sum(Salary),
tot_points = sum(Points)
), on="team_no", by=.EACHI]
# team_no roster tot_salary tot_points
# 1: 3716 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, Ryze 9913 149.018
# 2: 3720 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, MNT 9983 146.168
# 3: 3721 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Noia, SKANKS224 9930 144.417
# 4: 3725 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, MNT 9950 145.173
# 5: 3726 ExoticDeer, Supreme, sasu, Swiftending, EGM, Saksa, Ryze, SKANKS224 9897 143.422
# ---
# 40202: 125663 EGM, Saksa, Miggel, Noia, Ryze, Keyser Soze, MNT, SKANKS224 8638 117.032
# 40203: 125664 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, MNT 8925 119.970
# 40204: 125665 EGM, Saksa, Miggel, Noia, Ryze, Madara, SkyLark, SKANKS224 8872 118.219
# 40205: 125666 EGM, Saksa, Miggel, Noia, Ryze, Madara, MNT, SKANKS224 8613 117.317
# 40206: 125667 EGM, Saksa, Miggel, Noia, Ryze, SkyLark, MNT, SKANKS224 8302 108.130
要了解 by=.EACHI
的作用,需要一些背景知识。这里的合并语法是 DT[i, j, on=cols, by=.EACHI]
。
- 如果
j
和by
被遗漏,它只是进行合并,就像FantasyTeams
的构造一样。 - 如果省略
by
,但包含j
,则合并后计算j
。 - 如果
by=.EACHI
,则对i
中的每个值单独计算j
。