如何匹配成对的值然后减去一列的值?

How to match pairs of values and then subtract value of a column?

我目前正在处理包含 2010 年至 2019 年所有大满贯网球比赛的数据集。数据框每场比赛包含两行,一行包含有关一名球员(获胜者)的信息,另一行包含有关其他玩家(失败者)。这些对之间的共性是 match_ID 变量。

我想创建一个名为 rank difference 的新变量。这个想法是让每一行的赢家和输家之间的 ATP 排名差异。

这是我正在使用的数据框的一个子集:

# A tibble: 9,290 x 5
# Groups:   player_id [444]
   match_id    player_id   rank winner full_name           
   <chr>       <chr>      <dbl> <fct>  <chr>               
 1 m_2019_A_0  atp_104731     6 True   Kevin Anderson      
 2 m_2019_A_1  atp_105932    20 True   Nikoloz Basilashvili
 3 m_2019_A_2  atp_105430    98 True   Radu Albot          
 4 m_2019_A_3  atp_105882   137 True   Stefano Travaglia   
 5 m_2019_A_4  atp_104269    28 True   Fernando Verdasco   
 6 m_2019_A_5  atp_104655    94 True   Pablo Cuevas        
 7 m_2019_A_7  atp_126774    15 True   Stefanos Tsitsipas  
 8 m_2019_A_8  atp_105777    21 True   Grigor Dimitrov     
 9 m_2019_A_9  atp_126207    39 True   Frances Tiafoe      
10 m_2019_A_10 atp_104745     2 True   Rafael Nadal        
# ... with 9,280 more rows

这是我尝试过但没有奏效的方法:

final_match_with_player %>%
group_by(match_id) %>%
mutate(diff_rank = rank[winner == 'True'] - rank[winner == 'False'])

你知道我该怎么做吗?

非常感谢您!

这能得到你想要的吗?

# example data
final_match_with_player <- tibble::tribble(
                                ~match_id,   ~player_id, ~rank, ~winner,             ~full_name,
                             "m_2019_A_0", "atp_104731",    6L,    TRUE,       "Kevin Anderson",
                             "m_2019_A_1", "atp_105932",   20L,    TRUE, "Nikoloz Basilashvili",
                             "m_2019_A_2", "atp_105430",   98L,    TRUE,           "Radu Albot",
                             "m_2019_A_3", "atp_105882",  137L,    TRUE,    "Stefano Travaglia",
                             "m_2019_A_0", "atp_106666",   30L,   FALSE,           "Joe Bloggs",
                             "m_2019_A_1", "atp_106667",   40L,   FALSE,             "John Doe",
                             "m_2019_A_2", "atp_106668",   50L,   FALSE,            "Some Body",
                             "m_2019_A_3", "atp_106669",   60L,   FALSE,           "Tennis Pro"
                             )


# create rank_diff
final_match_with_player %>% 
  group_by(match_id) %>% 
  mutate(rank_diff = ifelse(winner, -rank, rank), 
        rank_diff = sum(rank_diff)) %>% 
  ungroup()

导致:

A tibble: 8 × 6

  match_id   player_id   rank winner full_name            rank_diff
  <chr>      <chr>      <int> <lgl>  <chr>                    <int>
1 m_2019_A_0 atp_104731     6 TRUE   Kevin Anderson              24
2 m_2019_A_1 atp_105932    20 TRUE   Nikoloz Basilashvili        20
3 m_2019_A_2 atp_105430    98 TRUE   Radu Albot                 -48
4 m_2019_A_3 atp_105882   137 TRUE   Stefano Travaglia          -77
5 m_2019_A_0 atp_106666    30 FALSE  Joe Bloggs                  24
6 m_2019_A_1 atp_106667    40 FALSE  John Doe                    20
7 m_2019_A_2 atp_106668    50 FALSE  Some Body                  -48
8 m_2019_A_3 atp_106669    60 FALSE  Tennis Pro                 -77

根据更多信息进行编辑

仅按 match_id 排列和排名,然后在 case_when 条件中使用前导和滞后函数进行变异可能更容易:

# create rank_diff
final_match_with_player %>% 
arrange(match_id,rank) %>% 
  mutate(rank_diff = case_when(lead(match_id) == match_id ~ lead(rank) - rank, 
                            TRUE ~ lag(rank) - (rank)))

给予:

# A tibble: 8 × 6
  match_id   player_id   rank winner full_name            rank_diff
  <chr>      <chr>      <int> <lgl>  <chr>                    <int>
1 m_2019_A_0 atp_104731     6 TRUE   Kevin Anderson              24
2 m_2019_A_0 atp_106666    30 FALSE  Joe Bloggs                 -24
3 m_2019_A_1 atp_105932    20 TRUE   Nikoloz Basilashvili        20
4 m_2019_A_1 atp_106667    40 FALSE  John Doe                   -20
5 m_2019_A_2 atp_106668    50 FALSE  Some Body                   48
6 m_2019_A_2 atp_105430    98 TRUE   Radu Albot                 -48
7 m_2019_A_3 atp_106669    60 FALSE  Tennis Pro                  77
8 m_2019_A_3 atp_105882   137 TRUE   Stefano Travaglia          -77