比较游戏之间的日期以找到可能的优势
Comparing dates between games to find possible advantages
我想看看一支球队与对手相比的休息天数是否会影响比赛的结果。这是我开始的信息:
library(tidyverse)
library(lubridate)
schedule <- data.frame(Home = c('DAL', 'KC', 'DAL', 'OAK'),
Away = c('OAK', 'PHI', 'PHI', 'KC'),
Home_Final = c(30, 21, 28, 14),
Away_Final = c(35, 28, 7, 21),
Date = c('9/1/2015', '9/2/2015', '9/9/2015', '9/9/2015')
)
如果我只筛选出一个团队,我就能计算出他们的休息日,如下所示:
schedule <- schedule %>% filter(Home == 'PHI' |
Away == 'PHI')
day_dif = interval(lag(mdy(schedule$Date)),
mdy(schedule$Date))
schedule <- schedule %>%
mutate(days_off = (time_length(day_dif, "days")) - 1)
但我真正需要的是得到这样的东西:
Home
Away
Home_Final
Away_Final
Date
Home_Rest
Away_Rest
Adv
Adv_Days
Adv_Won
DAL
OAK
30
35
9/1/2015
null
null
null
null
null
KC
PHI
21
28
9/2/2015
null
null
null
null
null
DAL
PHI
28
7
9/9/2015
8
7
1
1
1
OAK
KC
14
21
9/9/2015
8
7
1
1
0
'Home_Rest' = 主队比赛间隔天数
'Away Rest' =客队比赛间隔天数
'Adv' = True/False 一方有优势
'Adv_Days' =天数优势
'Adv_Won' = 优势一方获胜
如果我过滤到各个团队并且没有找到将团队组合在一起的好方法,我将无法获得。我只是真的迷失了如何让这不仅仅是一支球队更进一步,以及如何将这些正确地分组以看待整个赛季。
如有任何帮助,我们将不胜感激!
这是pivot_longer然后pivot_wider的经典例子。
为了舒适和自信,我们将添加一个索引,game_id
。
schedule$game_id <- 1:nrow(schedule)
tschedule <- schedule %>%
mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>%
pivot_longer(cols=c(Home, Away), names_to="Road", values_to="Team") %>%
group_by(Team) %>%
mutate(lagDate = lag(Date)) %>%
mutate(Rest=Date-lagDate)
rest_schedule <- tschedule %>%
pivot_wider(id_cols=c(game_id, Date, Home_Final, Away_Final), names_from=Road, values_from = c(Team, Rest))
现在您可以使用 rest_schedule
.
计算任何您想要的东西
rest_schedule$Adv <- rest_schedule$Rest_Home != rest_schedule$Rest_Away
rest_schedule
# A tibble: 4 × 9
game_id Date Home_Final Away_Final Team_Home Team_Away Rest_Home Rest_Away Adv
<int> <date> <dbl> <dbl> <chr> <chr> <drtn> <drtn> <lgl>
1 1 2015-09-01 30 35 DAL OAK NA days NA days NA
2 2 2015-09-02 21 28 KC PHI NA days NA days NA
3 3 2015-09-09 28 7 DAL PHI 8 days 7 days TRUE
4 4 2015-09-09 14 21 OAK KC 8 days 7 days TRUE
我想看看一支球队与对手相比的休息天数是否会影响比赛的结果。这是我开始的信息:
library(tidyverse)
library(lubridate)
schedule <- data.frame(Home = c('DAL', 'KC', 'DAL', 'OAK'),
Away = c('OAK', 'PHI', 'PHI', 'KC'),
Home_Final = c(30, 21, 28, 14),
Away_Final = c(35, 28, 7, 21),
Date = c('9/1/2015', '9/2/2015', '9/9/2015', '9/9/2015')
)
如果我只筛选出一个团队,我就能计算出他们的休息日,如下所示:
schedule <- schedule %>% filter(Home == 'PHI' |
Away == 'PHI')
day_dif = interval(lag(mdy(schedule$Date)),
mdy(schedule$Date))
schedule <- schedule %>%
mutate(days_off = (time_length(day_dif, "days")) - 1)
但我真正需要的是得到这样的东西:
Home | Away | Home_Final | Away_Final | Date | Home_Rest | Away_Rest | Adv | Adv_Days | Adv_Won |
---|---|---|---|---|---|---|---|---|---|
DAL | OAK | 30 | 35 | 9/1/2015 | null | null | null | null | null |
KC | PHI | 21 | 28 | 9/2/2015 | null | null | null | null | null |
DAL | PHI | 28 | 7 | 9/9/2015 | 8 | 7 | 1 | 1 | 1 |
OAK | KC | 14 | 21 | 9/9/2015 | 8 | 7 | 1 | 1 | 0 |
'Home_Rest' = 主队比赛间隔天数
'Away Rest' =客队比赛间隔天数
'Adv' = True/False 一方有优势
'Adv_Days' =天数优势
'Adv_Won' = 优势一方获胜
如果我过滤到各个团队并且没有找到将团队组合在一起的好方法,我将无法获得。我只是真的迷失了如何让这不仅仅是一支球队更进一步,以及如何将这些正确地分组以看待整个赛季。
如有任何帮助,我们将不胜感激!
这是pivot_longer然后pivot_wider的经典例子。
为了舒适和自信,我们将添加一个索引,game_id
。
schedule$game_id <- 1:nrow(schedule)
tschedule <- schedule %>%
mutate(Date=as.Date(Date, format="%m/%d/%Y")) %>%
pivot_longer(cols=c(Home, Away), names_to="Road", values_to="Team") %>%
group_by(Team) %>%
mutate(lagDate = lag(Date)) %>%
mutate(Rest=Date-lagDate)
rest_schedule <- tschedule %>%
pivot_wider(id_cols=c(game_id, Date, Home_Final, Away_Final), names_from=Road, values_from = c(Team, Rest))
现在您可以使用 rest_schedule
.
rest_schedule$Adv <- rest_schedule$Rest_Home != rest_schedule$Rest_Away
rest_schedule
# A tibble: 4 × 9
game_id Date Home_Final Away_Final Team_Home Team_Away Rest_Home Rest_Away Adv
<int> <date> <dbl> <dbl> <chr> <chr> <drtn> <drtn> <lgl>
1 1 2015-09-01 30 35 DAL OAK NA days NA days NA
2 2 2015-09-02 21 28 KC PHI NA days NA days NA
3 3 2015-09-09 28 7 DAL PHI 8 days 7 days TRUE
4 4 2015-09-09 14 21 OAK KC 8 days 7 days TRUE