如何匹配成对的值然后减去一列的值?
How to match pairs of values and then subtract value of a column?
我目前正在处理包含 2010 年至 2019 年所有大满贯网球比赛的数据集。数据框每场比赛包含两行,一行包含有关一名球员(获胜者)的信息,另一行包含有关其他玩家(失败者)。这些对之间的共性是 match_ID
变量。
我想创建一个名为 rank difference 的新变量。这个想法是让每一行的赢家和输家之间的 ATP 排名差异。
这是我正在使用的数据框的一个子集:
# A tibble: 9,290 x 5
# Groups: player_id [444]
match_id player_id rank winner full_name
<chr> <chr> <dbl> <fct> <chr>
1 m_2019_A_0 atp_104731 6 True Kevin Anderson
2 m_2019_A_1 atp_105932 20 True Nikoloz Basilashvili
3 m_2019_A_2 atp_105430 98 True Radu Albot
4 m_2019_A_3 atp_105882 137 True Stefano Travaglia
5 m_2019_A_4 atp_104269 28 True Fernando Verdasco
6 m_2019_A_5 atp_104655 94 True Pablo Cuevas
7 m_2019_A_7 atp_126774 15 True Stefanos Tsitsipas
8 m_2019_A_8 atp_105777 21 True Grigor Dimitrov
9 m_2019_A_9 atp_126207 39 True Frances Tiafoe
10 m_2019_A_10 atp_104745 2 True Rafael Nadal
# ... with 9,280 more rows
这是我尝试过但没有奏效的方法:
final_match_with_player %>%
group_by(match_id) %>%
mutate(diff_rank = rank[winner == 'True'] - rank[winner == 'False'])
你知道我该怎么做吗?
非常感谢您!
这能得到你想要的吗?
# example data
final_match_with_player <- tibble::tribble(
~match_id, ~player_id, ~rank, ~winner, ~full_name,
"m_2019_A_0", "atp_104731", 6L, TRUE, "Kevin Anderson",
"m_2019_A_1", "atp_105932", 20L, TRUE, "Nikoloz Basilashvili",
"m_2019_A_2", "atp_105430", 98L, TRUE, "Radu Albot",
"m_2019_A_3", "atp_105882", 137L, TRUE, "Stefano Travaglia",
"m_2019_A_0", "atp_106666", 30L, FALSE, "Joe Bloggs",
"m_2019_A_1", "atp_106667", 40L, FALSE, "John Doe",
"m_2019_A_2", "atp_106668", 50L, FALSE, "Some Body",
"m_2019_A_3", "atp_106669", 60L, FALSE, "Tennis Pro"
)
# create rank_diff
final_match_with_player %>%
group_by(match_id) %>%
mutate(rank_diff = ifelse(winner, -rank, rank),
rank_diff = sum(rank_diff)) %>%
ungroup()
导致:
A tibble: 8 × 6
match_id player_id rank winner full_name rank_diff
<chr> <chr> <int> <lgl> <chr> <int>
1 m_2019_A_0 atp_104731 6 TRUE Kevin Anderson 24
2 m_2019_A_1 atp_105932 20 TRUE Nikoloz Basilashvili 20
3 m_2019_A_2 atp_105430 98 TRUE Radu Albot -48
4 m_2019_A_3 atp_105882 137 TRUE Stefano Travaglia -77
5 m_2019_A_0 atp_106666 30 FALSE Joe Bloggs 24
6 m_2019_A_1 atp_106667 40 FALSE John Doe 20
7 m_2019_A_2 atp_106668 50 FALSE Some Body -48
8 m_2019_A_3 atp_106669 60 FALSE Tennis Pro -77
根据更多信息进行编辑
仅按 match_id 排列和排名,然后在 case_when 条件中使用前导和滞后函数进行变异可能更容易:
# create rank_diff
final_match_with_player %>%
arrange(match_id,rank) %>%
mutate(rank_diff = case_when(lead(match_id) == match_id ~ lead(rank) - rank,
TRUE ~ lag(rank) - (rank)))
给予:
# A tibble: 8 × 6
match_id player_id rank winner full_name rank_diff
<chr> <chr> <int> <lgl> <chr> <int>
1 m_2019_A_0 atp_104731 6 TRUE Kevin Anderson 24
2 m_2019_A_0 atp_106666 30 FALSE Joe Bloggs -24
3 m_2019_A_1 atp_105932 20 TRUE Nikoloz Basilashvili 20
4 m_2019_A_1 atp_106667 40 FALSE John Doe -20
5 m_2019_A_2 atp_106668 50 FALSE Some Body 48
6 m_2019_A_2 atp_105430 98 TRUE Radu Albot -48
7 m_2019_A_3 atp_106669 60 FALSE Tennis Pro 77
8 m_2019_A_3 atp_105882 137 TRUE Stefano Travaglia -77
我目前正在处理包含 2010 年至 2019 年所有大满贯网球比赛的数据集。数据框每场比赛包含两行,一行包含有关一名球员(获胜者)的信息,另一行包含有关其他玩家(失败者)。这些对之间的共性是 match_ID
变量。
我想创建一个名为 rank difference 的新变量。这个想法是让每一行的赢家和输家之间的 ATP 排名差异。
这是我正在使用的数据框的一个子集:
# A tibble: 9,290 x 5
# Groups: player_id [444]
match_id player_id rank winner full_name
<chr> <chr> <dbl> <fct> <chr>
1 m_2019_A_0 atp_104731 6 True Kevin Anderson
2 m_2019_A_1 atp_105932 20 True Nikoloz Basilashvili
3 m_2019_A_2 atp_105430 98 True Radu Albot
4 m_2019_A_3 atp_105882 137 True Stefano Travaglia
5 m_2019_A_4 atp_104269 28 True Fernando Verdasco
6 m_2019_A_5 atp_104655 94 True Pablo Cuevas
7 m_2019_A_7 atp_126774 15 True Stefanos Tsitsipas
8 m_2019_A_8 atp_105777 21 True Grigor Dimitrov
9 m_2019_A_9 atp_126207 39 True Frances Tiafoe
10 m_2019_A_10 atp_104745 2 True Rafael Nadal
# ... with 9,280 more rows
这是我尝试过但没有奏效的方法:
final_match_with_player %>%
group_by(match_id) %>%
mutate(diff_rank = rank[winner == 'True'] - rank[winner == 'False'])
你知道我该怎么做吗?
非常感谢您!
这能得到你想要的吗?
# example data
final_match_with_player <- tibble::tribble(
~match_id, ~player_id, ~rank, ~winner, ~full_name,
"m_2019_A_0", "atp_104731", 6L, TRUE, "Kevin Anderson",
"m_2019_A_1", "atp_105932", 20L, TRUE, "Nikoloz Basilashvili",
"m_2019_A_2", "atp_105430", 98L, TRUE, "Radu Albot",
"m_2019_A_3", "atp_105882", 137L, TRUE, "Stefano Travaglia",
"m_2019_A_0", "atp_106666", 30L, FALSE, "Joe Bloggs",
"m_2019_A_1", "atp_106667", 40L, FALSE, "John Doe",
"m_2019_A_2", "atp_106668", 50L, FALSE, "Some Body",
"m_2019_A_3", "atp_106669", 60L, FALSE, "Tennis Pro"
)
# create rank_diff
final_match_with_player %>%
group_by(match_id) %>%
mutate(rank_diff = ifelse(winner, -rank, rank),
rank_diff = sum(rank_diff)) %>%
ungroup()
导致:
A tibble: 8 × 6
match_id player_id rank winner full_name rank_diff
<chr> <chr> <int> <lgl> <chr> <int>
1 m_2019_A_0 atp_104731 6 TRUE Kevin Anderson 24
2 m_2019_A_1 atp_105932 20 TRUE Nikoloz Basilashvili 20
3 m_2019_A_2 atp_105430 98 TRUE Radu Albot -48
4 m_2019_A_3 atp_105882 137 TRUE Stefano Travaglia -77
5 m_2019_A_0 atp_106666 30 FALSE Joe Bloggs 24
6 m_2019_A_1 atp_106667 40 FALSE John Doe 20
7 m_2019_A_2 atp_106668 50 FALSE Some Body -48
8 m_2019_A_3 atp_106669 60 FALSE Tennis Pro -77
根据更多信息进行编辑
仅按 match_id 排列和排名,然后在 case_when 条件中使用前导和滞后函数进行变异可能更容易:
# create rank_diff
final_match_with_player %>%
arrange(match_id,rank) %>%
mutate(rank_diff = case_when(lead(match_id) == match_id ~ lead(rank) - rank,
TRUE ~ lag(rank) - (rank)))
给予:
# A tibble: 8 × 6
match_id player_id rank winner full_name rank_diff
<chr> <chr> <int> <lgl> <chr> <int>
1 m_2019_A_0 atp_104731 6 TRUE Kevin Anderson 24
2 m_2019_A_0 atp_106666 30 FALSE Joe Bloggs -24
3 m_2019_A_1 atp_105932 20 TRUE Nikoloz Basilashvili 20
4 m_2019_A_1 atp_106667 40 FALSE John Doe -20
5 m_2019_A_2 atp_106668 50 FALSE Some Body 48
6 m_2019_A_2 atp_105430 98 TRUE Radu Albot -48
7 m_2019_A_3 atp_106669 60 FALSE Tennis Pro 77
8 m_2019_A_3 atp_105882 137 TRUE Stefano Travaglia -77