查找一列 (player_name) 中 3 的所有组合,按另一列 (team_name + match_id) 分组并计算每个组合的实例
Find all combinations of 3 in a column (player_name), grouped by another column (team_name + match_id) and count instances of each combo
我一直在通过玩不同的运动数据来自学一些 R,但我碰壁了。
match_id player_name player_team points
Match1 Player 1 Team 1 20
Match1 Player 2 Team 1 23
Match1 Player 3 Team 1 24
Match1 Player 4 Team 2 26
Match1 Player 5 Team 2 21
Match1 Player 6 Team 2 22
Match1 Player 7 Team 2 43
Match1 Player 8 Team 2 38
Match2 Player 9 Team 3 24
Match2 Player 10 Team 3 29
Match2 Player 11 Team 3 23
Match2 Player 12 Team 3 22
Match2 Player 13 Team 4 20
Match2 Player 14 Team 4 32
Match3 Player 15 Team 5 24
Match3 Player 16 Team 5 27
Match3 Player 17 Team 5 23
Match3 Player 18 Team 5 20
Match3 Player 19 Team 5 23
数据会持续整个赛季,因此球队和球员会随着数据不断重复。
我正在尝试采用上述方法并找到同一球队的 3 名不同球员的所有组合,他们在一场比赛中获得 20 分或更多分(分数已经被过滤为仅包括 20+),然后找出每个组合出现了多少场比赛以便告诉我同一支球队中哪一组 3 名球员在一起比赛时经常得分 20+。
由于不同球队的一些球员有相同的名字,我使用 mutate 组合 player_team 和 player_name 以及组合 player_team 和 match_id 只是因为一些尝试最终将来自不同球队的球员结合在一起。
我能得到的最接近的是使用下面的代码,但它只适用于 2 的组合。
data <- players %>%
filter(disposals >= 20)
data <- data %>%
select(match_id, player_name, player_team)
data <- data %>%
mutate(match_id = paste(player_team, match_id, sep = "_"))%>%
mutate(player_name = paste(player_team, player_name, sep = "_"))
data <- data %>%
select(match_id, player_name)
dataout <- get.data.frame(
graph_from_adjacency_matrix(
crossprod(table(data)),
mode = "directed",
weighted = TRUE,
diag = FALSE,
)
)
这给了我下面的(权重是基于整个数据集的出现次数而不是上面的示例,到目前为止每个团队都打了 3 场比赛)
from
to
weight
Team 1_Player 1
Team 1_Player 2
1
Team 1_Player 1
Team 1_Player 3
3
Team 1_Player 2
Team 1_Player 3
1
Team 2_Player 4
Team 2_Player 5
2
Team 2_Player 4
Team 2_Player 6
1
Team 2_Player 4
Team 2_Player 7
3
Team 2_Player 4
Team 2_Player 8
3
Team 2_Player 5
Team 2_Player 6
1
Team 2_Player 5
Team 2_Player 7
2
请注意,组合不会在所有可能的顺序中重复(即认识到 Team 1_Player 1 + Team 1_Player 2 与 Team 1_Player 2 + Team 1_Player 1)
是否有任何其他解决方案可以让我包括三个玩家(或更多)而不是两个?
您可以使用函数 combn(m = 3)
获取所有可能的三元组:
library(tidyverse)
data <- tribble(
~match_id, ~player_name, ~team_name, ~points,
"Match1", 1L, 1L, 20L,
"Match1", 2L, 1L, 23L,
"Match1", 3L, 1L, 24L,
"Match1", 4L, 2L, 26L,
"Match1", 5L, 2L, 21L,
"Match1", 6L, 2L, 22L,
"Match1", 7L, 2L, 43L,
"Match1", 8L, 2L, 38L,
"Match2", 9L, 3L, 24L,
"Match2", 10L, 3L, 29L,
"Match2", 11L, 3L, 23L,
"Match2", 12L, 3L, 22L,
"Match2", 13L, 4L, 20L,
"Match2", 14L, 4L, 32L,
"Match3", 15L, 5L, 24L,
"Match3", 16L, 5L, 27L,
"Match3", 17L, 5L, 23L,
"Match3", 18L, 5L, 20L,
"Match3", 19L, 5L, 23L
)
combinations_data <-
data %>%
filter(points >= 20) %>%
nest(-c(team_name, match_id)) %>%
mutate(
combinations = data %>% map(possibly(~ {
.x$player_name %>% unique() %>% combn(3)
}, NA))
)
#> Warning: All elements of `...` must be named.
#> Did you want `data = -c(team_name, match_id)`?
combinations_data %>%
filter(match_id == "Match1" & team_name == 2) %>%
pull(combinations) %>%
first()
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 4 4 4 4 4 4 5 5 5 6
#> [2,] 5 5 5 6 6 7 6 6 7 7
#> [3,] 6 7 8 7 8 8 7 8 8 8
由 reprex package (v2.0.0)
于 2022-04-08 创建
Match1 中有 10 个独特的 2 队球员组合,得分均高于 20。
我一直在通过玩不同的运动数据来自学一些 R,但我碰壁了。
match_id player_name player_team points
Match1 Player 1 Team 1 20
Match1 Player 2 Team 1 23
Match1 Player 3 Team 1 24
Match1 Player 4 Team 2 26
Match1 Player 5 Team 2 21
Match1 Player 6 Team 2 22
Match1 Player 7 Team 2 43
Match1 Player 8 Team 2 38
Match2 Player 9 Team 3 24
Match2 Player 10 Team 3 29
Match2 Player 11 Team 3 23
Match2 Player 12 Team 3 22
Match2 Player 13 Team 4 20
Match2 Player 14 Team 4 32
Match3 Player 15 Team 5 24
Match3 Player 16 Team 5 27
Match3 Player 17 Team 5 23
Match3 Player 18 Team 5 20
Match3 Player 19 Team 5 23
数据会持续整个赛季,因此球队和球员会随着数据不断重复。 我正在尝试采用上述方法并找到同一球队的 3 名不同球员的所有组合,他们在一场比赛中获得 20 分或更多分(分数已经被过滤为仅包括 20+),然后找出每个组合出现了多少场比赛以便告诉我同一支球队中哪一组 3 名球员在一起比赛时经常得分 20+。
由于不同球队的一些球员有相同的名字,我使用 mutate 组合 player_team 和 player_name 以及组合 player_team 和 match_id 只是因为一些尝试最终将来自不同球队的球员结合在一起。
我能得到的最接近的是使用下面的代码,但它只适用于 2 的组合。
data <- players %>%
filter(disposals >= 20)
data <- data %>%
select(match_id, player_name, player_team)
data <- data %>%
mutate(match_id = paste(player_team, match_id, sep = "_"))%>%
mutate(player_name = paste(player_team, player_name, sep = "_"))
data <- data %>%
select(match_id, player_name)
dataout <- get.data.frame(
graph_from_adjacency_matrix(
crossprod(table(data)),
mode = "directed",
weighted = TRUE,
diag = FALSE,
)
)
这给了我下面的(权重是基于整个数据集的出现次数而不是上面的示例,到目前为止每个团队都打了 3 场比赛)
from | to | weight |
---|---|---|
Team 1_Player 1 | Team 1_Player 2 | 1 |
Team 1_Player 1 | Team 1_Player 3 | 3 |
Team 1_Player 2 | Team 1_Player 3 | 1 |
Team 2_Player 4 | Team 2_Player 5 | 2 |
Team 2_Player 4 | Team 2_Player 6 | 1 |
Team 2_Player 4 | Team 2_Player 7 | 3 |
Team 2_Player 4 | Team 2_Player 8 | 3 |
Team 2_Player 5 | Team 2_Player 6 | 1 |
Team 2_Player 5 | Team 2_Player 7 | 2 |
请注意,组合不会在所有可能的顺序中重复(即认识到 Team 1_Player 1 + Team 1_Player 2 与 Team 1_Player 2 + Team 1_Player 1)
是否有任何其他解决方案可以让我包括三个玩家(或更多)而不是两个?
您可以使用函数 combn(m = 3)
获取所有可能的三元组:
library(tidyverse)
data <- tribble(
~match_id, ~player_name, ~team_name, ~points,
"Match1", 1L, 1L, 20L,
"Match1", 2L, 1L, 23L,
"Match1", 3L, 1L, 24L,
"Match1", 4L, 2L, 26L,
"Match1", 5L, 2L, 21L,
"Match1", 6L, 2L, 22L,
"Match1", 7L, 2L, 43L,
"Match1", 8L, 2L, 38L,
"Match2", 9L, 3L, 24L,
"Match2", 10L, 3L, 29L,
"Match2", 11L, 3L, 23L,
"Match2", 12L, 3L, 22L,
"Match2", 13L, 4L, 20L,
"Match2", 14L, 4L, 32L,
"Match3", 15L, 5L, 24L,
"Match3", 16L, 5L, 27L,
"Match3", 17L, 5L, 23L,
"Match3", 18L, 5L, 20L,
"Match3", 19L, 5L, 23L
)
combinations_data <-
data %>%
filter(points >= 20) %>%
nest(-c(team_name, match_id)) %>%
mutate(
combinations = data %>% map(possibly(~ {
.x$player_name %>% unique() %>% combn(3)
}, NA))
)
#> Warning: All elements of `...` must be named.
#> Did you want `data = -c(team_name, match_id)`?
combinations_data %>%
filter(match_id == "Match1" & team_name == 2) %>%
pull(combinations) %>%
first()
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,] 4 4 4 4 4 4 5 5 5 6
#> [2,] 5 5 5 6 6 7 6 6 7 7
#> [3,] 6 7 8 7 8 8 7 8 8 8
由 reprex package (v2.0.0)
于 2022-04-08 创建Match1 中有 10 个独特的 2 队球员组合,得分均高于 20。