查找一列 (player_name) 中 3 的所有组合，按另一列 (team_name + match_id) 分组并计算每个组合的实例

Question

我一直在通过玩不同的运动数据来自学一些 R，但我碰壁了。

match_id    player_name player_team points
Match1  Player 1    Team 1  20
Match1  Player 2    Team 1  23
Match1  Player 3    Team 1  24
Match1  Player 4    Team 2  26
Match1  Player 5    Team 2  21
Match1  Player 6    Team 2  22
Match1  Player 7    Team 2  43
Match1  Player 8    Team 2  38
Match2  Player 9    Team 3  24
Match2  Player 10   Team 3  29
Match2  Player 11   Team 3  23
Match2  Player 12   Team 3  22
Match2  Player 13   Team 4  20
Match2  Player 14   Team 4  32
Match3  Player 15   Team 5  24
Match3  Player 16   Team 5  27
Match3  Player 17   Team 5  23
Match3  Player 18   Team 5  20
Match3  Player 19   Team 5  23

数据会持续整个赛季，因此球队和球员会随着数据不断重复。我正在尝试采用上述方法并找到同一球队的 3 名不同球员的所有组合，他们在一场比赛中获得 20 分或更多分（分数已经被过滤为仅包括 20+），然后找出每个组合出现了多少场比赛以便告诉我同一支球队中哪一组 3 名球员在一起比赛时经常得分 20+。

由于不同球队的一些球员有相同的名字，我使用 mutate 组合 player_team 和 player_name 以及组合 player_team 和 match_id 只是因为一些尝试最终将来自不同球队的球员结合在一起。

我能得到的最接近的是使用下面的代码，但它只适用于 2 的组合。

data <- players %>%
  filter(disposals >= 20)

data <- data %>%
  select(match_id, player_name, player_team)

data <- data %>%
  mutate(match_id = paste(player_team, match_id, sep = "_"))%>%
  mutate(player_name = paste(player_team, player_name, sep = "_"))

data <- data %>%
  select(match_id, player_name)

dataout <- get.data.frame(
  graph_from_adjacency_matrix(
    crossprod(table(data)),
    mode = "directed",
    weighted = TRUE,
    diag = FALSE,
  )
)

这给了我下面的（权重是基于整个数据集的出现次数而不是上面的示例，到目前为止每个团队都打了 3 场比赛）

from	to	weight
Team 1_Player 1	Team 1_Player 2	1
Team 1_Player 1	Team 1_Player 3	3
Team 1_Player 2	Team 1_Player 3	1
Team 2_Player 4	Team 2_Player 5	2
Team 2_Player 4	Team 2_Player 6	1
Team 2_Player 4	Team 2_Player 7	3
Team 2_Player 4	Team 2_Player 8	3
Team 2_Player 5	Team 2_Player 6	1
Team 2_Player 5	Team 2_Player 7	2

请注意，组合不会在所有可能的顺序中重复（即认识到 Team 1_Player 1 + Team 1_Player 2 与 Team 1_Player 2 + Team 1_Player 1)

是否有任何其他解决方案可以让我包括三个玩家（或更多）而不是两个？

Answer 1

您可以使用函数 combn(m = 3) 获取所有可能的三元组：

library(tidyverse)

data <- tribble(
  ~match_id, ~player_name, ~team_name, ~points,
   "Match1",           1L,         1L,     20L,
   "Match1",           2L,         1L,     23L,
   "Match1",           3L,         1L,     24L,
   "Match1",           4L,         2L,     26L,
   "Match1",           5L,         2L,     21L,
   "Match1",           6L,         2L,     22L,
   "Match1",           7L,         2L,     43L,
   "Match1",           8L,         2L,     38L,
   "Match2",           9L,         3L,     24L,
   "Match2",          10L,         3L,     29L,
   "Match2",          11L,         3L,     23L,
   "Match2",          12L,         3L,     22L,
   "Match2",          13L,         4L,     20L,
   "Match2",          14L,         4L,     32L,
   "Match3",          15L,         5L,     24L,
   "Match3",          16L,         5L,     27L,
   "Match3",          17L,         5L,     23L,
   "Match3",          18L,         5L,     20L,
   "Match3",          19L,         5L,     23L
  )

combinations_data <-
  data %>%
  filter(points >= 20) %>%
  nest(-c(team_name, match_id)) %>%
  mutate(
    combinations = data %>% map(possibly(~ {
      .x$player_name %>% unique() %>% combn(3)
    }, NA))
  )
#> Warning: All elements of `...` must be named.
#> Did you want `data = -c(team_name, match_id)`?

combinations_data %>%
  filter(match_id == "Match1" & team_name == 2) %>%
  pull(combinations) %>%
  first()
#>      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
#> [1,]    4    4    4    4    4    4    5    5    5     6
#> [2,]    5    5    5    6    6    7    6    6    7     7
#> [3,]    6    7    8    7    8    8    7    8    8     8

^{由 reprex package (v2.0.0)}

于 2022-04-08 创建

Match1 中有 10 个独特的 2 队球员组合，得分均高于 20。

查找一列 (player_name) 中 3 的所有组合，按另一列 (team_name + match_id) 分组并计算每个组合的实例

Find all combinations of 3 in a column (player_name), grouped by another column (team_name + match_id) and count instances of each combo

combinations

r