使用 RVest 创建 HTML Table 然后使用操作和清理进入 DF
Using RVest to Create HTML Table And Then Using Manipulating and Cleaning into DF
我想抓取以下网页:
https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837
... 具体来说,是顶部菜单中的“逐个播放”选项卡。获取信息非常简单:
library(tidyverse)
library(rvest)
url <- "https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837"
page <- rvest::read_html(url)
page %>%
html_table() %>%
pluck(27)
... 结果是:
# A tibble: 7 x 2
`Assumption at 15:00` `Assumption at 15:00`
<chr> <chr>
1 Down & Distance Play
2 Assumption at 15:00 Assumption at 15:00
3 1st and 10 at ASM19 Assumption drive start at 15:00.
4 1st and 10 at ASM19 Turner,Easton rush for 2 yards loss to the ASM17 (Justice,Amani).
5 2nd and 12 at ASM17 Exum-Strong,Khaleed rush for 9 yards gain to the ASM26 (Justice,Amani).
6 3rd and 3 at ASM26 Turner,Easton pass incomplete to Collier,Bailey broken up by Meyers,Ryan.
7 4th and 3 at ASM26 Bertolazzo,Gabriel punt 46 yards to the KTZ28.
这就是我迷路的地方。我想获取这些信息并将其处理到不同的列中。例如,这是来自 Excel 的屏幕截图,其中我列出了我希望完成的输出看起来像的内容:
如您所见,我正在从每个单独的游戏中获取信息并将这些信息翻译成特定的栏目。是抢戏吗?是传球吗?获得了多少码?向下、距离、yardline_100等
然后在整个游戏过程中执行该过程。
任何关于如何启动流程的建议 and/or 都将不胜感激。说到 R,抓取当然不是我的核心优势。
这是一种使用 tidyverse
中的函数实现结果的方法。有很多不同的方法可以得到相同的结果,这只是一种方法。代码结构分为三个主要部分:首先,通过绑定多个列表的行构建一个大数据框,第二个删除原始数据框中无用的行,第三个创建所有变量。
tab
数据框也与您的 page
原始输入略有不同,请参阅数据和函数部分中的代码。我基本上更改了列名,使它们不相同,并将它们重命名为 col1
和 col2
.
实际使用的只是几个不同的函数。我创建了 extract_digit
,它从字符串中提取第 n 次出现的数字。 str_extract
和 str_match
从字符串中提取指定的模式,而 str_detects
仅检测(而 returns 逻辑,TRUE 或 FALSE)。 word
从字符串中获取第 n 个单词。
library(tidyverse)
tab %>%
# Bind rows of the list to make a big data frame
bind_rows() %>%
# Define the quarter
mutate(quarter = str_extract(paste(col1, col2), "\w+(?=\s+quarter)")) %>%
fill(quarter) %>%
# Keep only rows that starts with a number
filter(str_detect(col1, '^[0-9]')) %>%
# Create some variables
mutate(drive_start = str_extract(col2, '[0-9][0-9]:[0-9][0-9]')) %>%
fill(drive_start) %>%
# Remove duplicated rows in the first column, keeping the ones with the longest string
group_by(col1) %>%
slice_max(nchar(col2)) %>%
ungroup() %>%
# ordering the rows
arrange(quarter, desc(drive_start)) %>%
# Mutating part, creation of new variables
mutate(posteam = ifelse(str_detect(col1, "ASM"), "ASM", "KTZ"),
defteam = ifelse(str_detect(col1, "ASM"), "KTZ", "ASM"),
yardline_100 = 100 - extract_digit(col1, 3),
down = extract_digit(col1, 1),
ydstogo = extract_digit(col1, 2),
ydsgained = -c(diff(ydstogo), 0),
penalty = str_detect(col2, "PENALTY"),
end = str_detect(col2, "End"),
play_type = case_when(penalty ~ "penalty",
end ~ "end",
T ~ word(col2, 2)),
rush = +(play_type == "rush"),
pass = +(play_type == "pass"),
special = +(!play_type %in% c("rush", "pass")),
passer = ifelse(pass == 1, word(col2, 1), NA),
rusher = ifelse(rush == 1, word(col2, 1), NA),
tackle = str_match(col2, "(?<=\().+?(?=\))")[,1], #Get word between parentheses
complete_pass = case_when(str_detect(col2, "incomplete") & pass == 1 ~ 0,
pass == 1 ~ 1,
TRUE ~ NA_real_),
pass_breakup = word(col2, 2, sep = "broken up by "),
punt = +(play_type == "punt"),
punter = ifelse(punt == 1, word(col2, 1), NA)) %>%
select(-c(col1, col2, penalty, end))
输出
# A tibble: 128 x 19
quarter drive_start posteam defteam yardline_100 down ydstogo ydsgained play_type rush pass special passer rusher tackle complete_pass pass_breakup punt punter
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <int> <int> <int> <chr> <chr> <chr> <dbl> <chr> <int> <chr>
1 1st 15:00 ASM KTZ 81 1 10 -2 rush 1 0 0 NA Turner,Easton Justice,Amani NA NA 0 NA
2 1st 15:00 ASM KTZ 83 2 12 9 rush 1 0 0 NA Exum-Strong,Khaleed Justice,Amani NA NA 0 NA
3 1st 15:00 ASM KTZ 74 3 3 0 pass 0 1 0 Turner,Easton NA NA 0 Meyers,Ryan. 0 NA
4 1st 15:00 ASM KTZ 74 4 3 -7 punt 0 0 1 NA NA NA NA NA 1 Berto~
5 1st 13:30 KTZ ASM 72 1 10 0 rush 1 0 0 NA Nickel,Eric Borguet,Keelan NA NA 0 NA
6 1st 13:30 KTZ ASM 61 1 10 -7 rush 1 0 0 NA Davis,Jordan Allen,Edward NA NA 0 NA
7 1st 13:30 KTZ ASM 68 2 17 10 pass 0 1 0 Nickel,Eric NA Malm,Daniel 1 NA 0 NA
8 1st 13:30 KTZ ASM 58 2 7 -1 penalty 0 0 1 NA NA Omokaro,Akugbe NA NA 0 NA
9 1st 13:30 KTZ ASM 70 2 8 6 rush 1 0 0 NA Davis,Jordan Omokaro,Akugbe; Wright,Tr~ NA NA 0 NA
10 1st 13:30 KTZ ASM 53 3 2 0 pass 0 1 0 Nickel,Eric NA NA 0 NA 0 NA
# ... with 118 more rows
数据与功能
extract_digit <- function(x, n) as.numeric(sapply(str_extract_all(x, "[0-9]+"), `[`, n))
# This function is a wrapper to get the nth occurrence of a number in a string.
tab <- page %>%
html_table() %>%
.[26:47] %>%
map(~ .x %>%
tibble(.name_repair = "universal") %>%
rename_with(~ str_c("col", 1:2)))
[[1]]
# A tibble: 4 x 1
col1
<chr>
1 Play
2 Kutztown wins toss and defers; ASM will receive; KTZ will defend East end-zone.
3 Start of 1st quarter, clock 15:00.
4 Krcic,Dean kickoff 65 yards to the ASM00 Easton,Brevin return 19 yards to the ASM19 (~
[[2]]
# A tibble: 7 x 2
col1 col2
<chr> <chr>
1 Down & Distance Play
2 Assumption at 15:00 Assumption at 15:00
3 1st and 10 at ASM19 Assumption drive start at 15:00.
4 1st and 10 at ASM19 Turner,Easton rush for 2 yards loss to the ASM17 (Justice,Amani).
5 2nd and 12 at ASM17 Exum-Strong,Khaleed rush for 9 yards gain to the ASM26 (Justice,A~
6 3rd and 3 at ASM26 Turner,Easton pass incomplete to Collier,Bailey broken up by Meye~
7 4th and 3 at ASM26 Bertolazzo,Gabriel punt 46 yards to the KTZ28.
# ...
我尝试创建了一个 extract_plays
函数,该函数基本上通过使用一系列 stringr::str_detect()
、stringr::str_extract()
和 if_else()
来解析 Play 列函数。
棘手的是每个季度开始时的表格有些不一致,需要特别注意,否则代码应该相当self-explanatory。
我不是美式足球的粉丝,所以请检查我所做的一些假设。
library(tidyverse)
library(rvest)
url <- "https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837"
page <- rvest::read_html(url)
# set index for teams
teams <- tibble(team = c("Assumption", "Kutztown"),
posteam = c("ASM", "KTZ"),
defteam = c("KTZ","ASM"))
# grab the play pages from table
plays <- page %>%
html_table() %>%
.[27:47] %>%
# use colnames to extract teams later
map(~rbind(colnames(.x),.x))
## note that 7th and 17th elements of plays are the 2nd and 4th quarter starts - different format
## 11th element is 2nd half start, incompatible table (not included in result)
# create plays extraction function
extract_plays <- function(df){
df <- df %>%
set_names(c("Downs","Play")) %>%
mutate(team = str_extract(first(Downs), "^\S+"),
drive_start = str_extract(first(Downs),"\S+$")) %>%
filter(str_detect(Downs, "and")) %>%
inner_join(teams) %>%
extract(Downs,
into = c("down","ydstogo","yardline_100"),
regex = "(^\d..) and (\d+) at (.*)") %>%
mutate(yardline_100 = ifelse(str_detect(yardline_100,defteam),
parse_number(yardline_100),
100 - parse_number(yardline_100))) %>%
mutate(pass = +str_detect(Play, "pass"),
rush = +str_detect(Play, "rush"),
punt = +str_detect(Play, "punt"),
special = ifelse(pass + rush == 0,1,0)) %>%
mutate(play_type = case_when(str_detect(Play, "pass") ~ "pass",
str_detect(Play, "rush") ~ "rush",
str_detect(Play, "punt") ~ "punt",
str_detect(Play, "field goal") ~ "fieldgoal",
str_detect(Play, "kickoff") ~ "kickoff",
TRUE ~ "other")) %>%
mutate(passer = str_extract(Play, "(.*)(?=\spass)"),
rusher = str_extract(Play, "(.*)(?=\srush)"),
punter = str_extract(Play, "(.*)(?=\spunt)"),
tackle = NA,
tackle = ifelse(play_type == "pass",
str_extract(Play, "(?<=\().+?(?=\))"), tackle),
tackle = ifelse(play_type == "rush",
str_extract(Play, "(?<=\().+?(?=\))"), tackle),
yrds = ifelse(play_type == "rush", str_extract(Play, "(\d+\syards\s\w{4})"),NA),
yrds = ifelse(play_type == "pass", str_extract(Play, "for\s\d+\syards"),yrds),
yrdsgained = ifelse(str_detect(yrds,"loss"),-1*parse_number(yrds),parse_number(yrds)),
complete_pass = case_when(str_detect(Play, "completed") ~ 1,
str_detect(Play, "incomplete") ~ 0,
TRUE ~ NA_real_),
pass_breakup = str_extract(Play,"((?<=broken\sup\sby\s).*$)")
) %>%
select(drive_start,posteam,defteam,yardline_100,down,ydstogo,play_type,pass,rush,special,passer,rusher,yrdsgained,tackle,complete_pass,pass_breakup,punt,punter,Play)
return(df)
}
## Grab for each quarter
Q1 <- plays %>%
.[1:6] %>%
map_df(extract_plays) %>%
mutate(Quarter = "1st", .before = drive_start)
Q2 <- plays %>%
.[8:10] %>%
map_df(extract_plays) %>%
mutate(Quarter = "2nd", .before = drive_start)
Q3 <- plays %>%
.[12:16] %>%
map_df(extract_plays) %>%
mutate(Quarter = "3rd", .before = drive_start)
Q4 <- plays %>%
.[18:21] %>%
map_df(extract_plays) %>%
mutate(Quarter = "4th", .before = drive_start)
## Grab the special cases for 2nd and 4th quarter starts
Q2.1 <- plays %>%
.[[7]] %>% rbind(c("Kutztown at 15:00","Play"),.) %>%
extract_plays() %>%
mutate(Quarter = "2nd", .before = drive_start)
Q4.1 <- plays %>%
.[[17]] %>% rbind(c("Kutztown at 15:00","Play"),.) %>%
extract_plays()%>%
mutate(Quarter = "4th", .before = drive_start)
## Add to Q2 and Q4
Q2 <- bind_rows(Q2.1,Q2)
Q4 <- bind_rows(Q4.1,Q4)
#final table
result <- bind_rows(list(Q1,Q2,Q3,Q4))
result
给予:
# A tibble: 170 × 20
Quarter drive_start posteam defteam yardline_100 down ydstogo play_type pass rush
<chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl>
1 1st 15:00 ASM KTZ 81 1st 10 other 0 0
2 1st 15:00 ASM KTZ 81 1st 10 rush 0 1
3 1st 15:00 ASM KTZ 83 2nd 12 rush 0 1
4 1st 15:00 ASM KTZ 74 3rd 3 pass 1 0
5 1st 15:00 ASM KTZ 74 4th 3 punt 0 0
6 1st 13:30 KTZ ASM 72 1st 10 other 0 0
7 1st 13:30 KTZ ASM 72 1st 10 rush 0 1
8 1st 13:30 KTZ ASM 70 2nd 8 rush 0 1
9 1st 13:30 KTZ ASM 66 3rd 4 rush 0 1
10 1st 13:30 KTZ ASM 61 1st 10 rush 0 1
# … with 160 more rows, and 10 more variables: special <dbl>, passer <chr>, rusher <chr>,
# yrdsgained <dbl>, tackle <chr>, complete_pass <chr>, pass_breakup <chr>, punt <dbl>,
# punter <chr>, Play <chr>
我想抓取以下网页:
https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837
... 具体来说,是顶部菜单中的“逐个播放”选项卡。获取信息非常简单:
library(tidyverse)
library(rvest)
url <- "https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837"
page <- rvest::read_html(url)
page %>%
html_table() %>%
pluck(27)
... 结果是:
# A tibble: 7 x 2
`Assumption at 15:00` `Assumption at 15:00`
<chr> <chr>
1 Down & Distance Play
2 Assumption at 15:00 Assumption at 15:00
3 1st and 10 at ASM19 Assumption drive start at 15:00.
4 1st and 10 at ASM19 Turner,Easton rush for 2 yards loss to the ASM17 (Justice,Amani).
5 2nd and 12 at ASM17 Exum-Strong,Khaleed rush for 9 yards gain to the ASM26 (Justice,Amani).
6 3rd and 3 at ASM26 Turner,Easton pass incomplete to Collier,Bailey broken up by Meyers,Ryan.
7 4th and 3 at ASM26 Bertolazzo,Gabriel punt 46 yards to the KTZ28.
这就是我迷路的地方。我想获取这些信息并将其处理到不同的列中。例如,这是来自 Excel 的屏幕截图,其中我列出了我希望完成的输出看起来像的内容:
如您所见,我正在从每个单独的游戏中获取信息并将这些信息翻译成特定的栏目。是抢戏吗?是传球吗?获得了多少码?向下、距离、yardline_100等
然后在整个游戏过程中执行该过程。
任何关于如何启动流程的建议 and/or 都将不胜感激。说到 R,抓取当然不是我的核心优势。
这是一种使用 tidyverse
中的函数实现结果的方法。有很多不同的方法可以得到相同的结果,这只是一种方法。代码结构分为三个主要部分:首先,通过绑定多个列表的行构建一个大数据框,第二个删除原始数据框中无用的行,第三个创建所有变量。
tab
数据框也与您的 page
原始输入略有不同,请参阅数据和函数部分中的代码。我基本上更改了列名,使它们不相同,并将它们重命名为 col1
和 col2
.
实际使用的只是几个不同的函数。我创建了 extract_digit
,它从字符串中提取第 n 次出现的数字。 str_extract
和 str_match
从字符串中提取指定的模式,而 str_detects
仅检测(而 returns 逻辑,TRUE 或 FALSE)。 word
从字符串中获取第 n 个单词。
library(tidyverse)
tab %>%
# Bind rows of the list to make a big data frame
bind_rows() %>%
# Define the quarter
mutate(quarter = str_extract(paste(col1, col2), "\w+(?=\s+quarter)")) %>%
fill(quarter) %>%
# Keep only rows that starts with a number
filter(str_detect(col1, '^[0-9]')) %>%
# Create some variables
mutate(drive_start = str_extract(col2, '[0-9][0-9]:[0-9][0-9]')) %>%
fill(drive_start) %>%
# Remove duplicated rows in the first column, keeping the ones with the longest string
group_by(col1) %>%
slice_max(nchar(col2)) %>%
ungroup() %>%
# ordering the rows
arrange(quarter, desc(drive_start)) %>%
# Mutating part, creation of new variables
mutate(posteam = ifelse(str_detect(col1, "ASM"), "ASM", "KTZ"),
defteam = ifelse(str_detect(col1, "ASM"), "KTZ", "ASM"),
yardline_100 = 100 - extract_digit(col1, 3),
down = extract_digit(col1, 1),
ydstogo = extract_digit(col1, 2),
ydsgained = -c(diff(ydstogo), 0),
penalty = str_detect(col2, "PENALTY"),
end = str_detect(col2, "End"),
play_type = case_when(penalty ~ "penalty",
end ~ "end",
T ~ word(col2, 2)),
rush = +(play_type == "rush"),
pass = +(play_type == "pass"),
special = +(!play_type %in% c("rush", "pass")),
passer = ifelse(pass == 1, word(col2, 1), NA),
rusher = ifelse(rush == 1, word(col2, 1), NA),
tackle = str_match(col2, "(?<=\().+?(?=\))")[,1], #Get word between parentheses
complete_pass = case_when(str_detect(col2, "incomplete") & pass == 1 ~ 0,
pass == 1 ~ 1,
TRUE ~ NA_real_),
pass_breakup = word(col2, 2, sep = "broken up by "),
punt = +(play_type == "punt"),
punter = ifelse(punt == 1, word(col2, 1), NA)) %>%
select(-c(col1, col2, penalty, end))
输出
# A tibble: 128 x 19
quarter drive_start posteam defteam yardline_100 down ydstogo ydsgained play_type rush pass special passer rusher tackle complete_pass pass_breakup punt punter
<chr> <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <chr> <int> <int> <int> <chr> <chr> <chr> <dbl> <chr> <int> <chr>
1 1st 15:00 ASM KTZ 81 1 10 -2 rush 1 0 0 NA Turner,Easton Justice,Amani NA NA 0 NA
2 1st 15:00 ASM KTZ 83 2 12 9 rush 1 0 0 NA Exum-Strong,Khaleed Justice,Amani NA NA 0 NA
3 1st 15:00 ASM KTZ 74 3 3 0 pass 0 1 0 Turner,Easton NA NA 0 Meyers,Ryan. 0 NA
4 1st 15:00 ASM KTZ 74 4 3 -7 punt 0 0 1 NA NA NA NA NA 1 Berto~
5 1st 13:30 KTZ ASM 72 1 10 0 rush 1 0 0 NA Nickel,Eric Borguet,Keelan NA NA 0 NA
6 1st 13:30 KTZ ASM 61 1 10 -7 rush 1 0 0 NA Davis,Jordan Allen,Edward NA NA 0 NA
7 1st 13:30 KTZ ASM 68 2 17 10 pass 0 1 0 Nickel,Eric NA Malm,Daniel 1 NA 0 NA
8 1st 13:30 KTZ ASM 58 2 7 -1 penalty 0 0 1 NA NA Omokaro,Akugbe NA NA 0 NA
9 1st 13:30 KTZ ASM 70 2 8 6 rush 1 0 0 NA Davis,Jordan Omokaro,Akugbe; Wright,Tr~ NA NA 0 NA
10 1st 13:30 KTZ ASM 53 3 2 0 pass 0 1 0 Nickel,Eric NA NA 0 NA 0 NA
# ... with 118 more rows
数据与功能
extract_digit <- function(x, n) as.numeric(sapply(str_extract_all(x, "[0-9]+"), `[`, n))
# This function is a wrapper to get the nth occurrence of a number in a string.
tab <- page %>%
html_table() %>%
.[26:47] %>%
map(~ .x %>%
tibble(.name_repair = "universal") %>%
rename_with(~ str_c("col", 1:2)))
[[1]]
# A tibble: 4 x 1
col1
<chr>
1 Play
2 Kutztown wins toss and defers; ASM will receive; KTZ will defend East end-zone.
3 Start of 1st quarter, clock 15:00.
4 Krcic,Dean kickoff 65 yards to the ASM00 Easton,Brevin return 19 yards to the ASM19 (~
[[2]]
# A tibble: 7 x 2
col1 col2
<chr> <chr>
1 Down & Distance Play
2 Assumption at 15:00 Assumption at 15:00
3 1st and 10 at ASM19 Assumption drive start at 15:00.
4 1st and 10 at ASM19 Turner,Easton rush for 2 yards loss to the ASM17 (Justice,Amani).
5 2nd and 12 at ASM17 Exum-Strong,Khaleed rush for 9 yards gain to the ASM26 (Justice,A~
6 3rd and 3 at ASM26 Turner,Easton pass incomplete to Collier,Bailey broken up by Meye~
7 4th and 3 at ASM26 Bertolazzo,Gabriel punt 46 yards to the KTZ28.
# ...
我尝试创建了一个 extract_plays
函数,该函数基本上通过使用一系列 stringr::str_detect()
、stringr::str_extract()
和 if_else()
来解析 Play 列函数。
棘手的是每个季度开始时的表格有些不一致,需要特别注意,否则代码应该相当self-explanatory。
我不是美式足球的粉丝,所以请检查我所做的一些假设。
library(tidyverse)
library(rvest)
url <- "https://kubears.com/sports/football/stats/2021/assumption/boxscore/11837"
page <- rvest::read_html(url)
# set index for teams
teams <- tibble(team = c("Assumption", "Kutztown"),
posteam = c("ASM", "KTZ"),
defteam = c("KTZ","ASM"))
# grab the play pages from table
plays <- page %>%
html_table() %>%
.[27:47] %>%
# use colnames to extract teams later
map(~rbind(colnames(.x),.x))
## note that 7th and 17th elements of plays are the 2nd and 4th quarter starts - different format
## 11th element is 2nd half start, incompatible table (not included in result)
# create plays extraction function
extract_plays <- function(df){
df <- df %>%
set_names(c("Downs","Play")) %>%
mutate(team = str_extract(first(Downs), "^\S+"),
drive_start = str_extract(first(Downs),"\S+$")) %>%
filter(str_detect(Downs, "and")) %>%
inner_join(teams) %>%
extract(Downs,
into = c("down","ydstogo","yardline_100"),
regex = "(^\d..) and (\d+) at (.*)") %>%
mutate(yardline_100 = ifelse(str_detect(yardline_100,defteam),
parse_number(yardline_100),
100 - parse_number(yardline_100))) %>%
mutate(pass = +str_detect(Play, "pass"),
rush = +str_detect(Play, "rush"),
punt = +str_detect(Play, "punt"),
special = ifelse(pass + rush == 0,1,0)) %>%
mutate(play_type = case_when(str_detect(Play, "pass") ~ "pass",
str_detect(Play, "rush") ~ "rush",
str_detect(Play, "punt") ~ "punt",
str_detect(Play, "field goal") ~ "fieldgoal",
str_detect(Play, "kickoff") ~ "kickoff",
TRUE ~ "other")) %>%
mutate(passer = str_extract(Play, "(.*)(?=\spass)"),
rusher = str_extract(Play, "(.*)(?=\srush)"),
punter = str_extract(Play, "(.*)(?=\spunt)"),
tackle = NA,
tackle = ifelse(play_type == "pass",
str_extract(Play, "(?<=\().+?(?=\))"), tackle),
tackle = ifelse(play_type == "rush",
str_extract(Play, "(?<=\().+?(?=\))"), tackle),
yrds = ifelse(play_type == "rush", str_extract(Play, "(\d+\syards\s\w{4})"),NA),
yrds = ifelse(play_type == "pass", str_extract(Play, "for\s\d+\syards"),yrds),
yrdsgained = ifelse(str_detect(yrds,"loss"),-1*parse_number(yrds),parse_number(yrds)),
complete_pass = case_when(str_detect(Play, "completed") ~ 1,
str_detect(Play, "incomplete") ~ 0,
TRUE ~ NA_real_),
pass_breakup = str_extract(Play,"((?<=broken\sup\sby\s).*$)")
) %>%
select(drive_start,posteam,defteam,yardline_100,down,ydstogo,play_type,pass,rush,special,passer,rusher,yrdsgained,tackle,complete_pass,pass_breakup,punt,punter,Play)
return(df)
}
## Grab for each quarter
Q1 <- plays %>%
.[1:6] %>%
map_df(extract_plays) %>%
mutate(Quarter = "1st", .before = drive_start)
Q2 <- plays %>%
.[8:10] %>%
map_df(extract_plays) %>%
mutate(Quarter = "2nd", .before = drive_start)
Q3 <- plays %>%
.[12:16] %>%
map_df(extract_plays) %>%
mutate(Quarter = "3rd", .before = drive_start)
Q4 <- plays %>%
.[18:21] %>%
map_df(extract_plays) %>%
mutate(Quarter = "4th", .before = drive_start)
## Grab the special cases for 2nd and 4th quarter starts
Q2.1 <- plays %>%
.[[7]] %>% rbind(c("Kutztown at 15:00","Play"),.) %>%
extract_plays() %>%
mutate(Quarter = "2nd", .before = drive_start)
Q4.1 <- plays %>%
.[[17]] %>% rbind(c("Kutztown at 15:00","Play"),.) %>%
extract_plays()%>%
mutate(Quarter = "4th", .before = drive_start)
## Add to Q2 and Q4
Q2 <- bind_rows(Q2.1,Q2)
Q4 <- bind_rows(Q4.1,Q4)
#final table
result <- bind_rows(list(Q1,Q2,Q3,Q4))
result
给予:
# A tibble: 170 × 20
Quarter drive_start posteam defteam yardline_100 down ydstogo play_type pass rush
<chr> <chr> <chr> <chr> <dbl> <chr> <chr> <chr> <dbl> <dbl>
1 1st 15:00 ASM KTZ 81 1st 10 other 0 0
2 1st 15:00 ASM KTZ 81 1st 10 rush 0 1
3 1st 15:00 ASM KTZ 83 2nd 12 rush 0 1
4 1st 15:00 ASM KTZ 74 3rd 3 pass 1 0
5 1st 15:00 ASM KTZ 74 4th 3 punt 0 0
6 1st 13:30 KTZ ASM 72 1st 10 other 0 0
7 1st 13:30 KTZ ASM 72 1st 10 rush 0 1
8 1st 13:30 KTZ ASM 70 2nd 8 rush 0 1
9 1st 13:30 KTZ ASM 66 3rd 4 rush 0 1
10 1st 13:30 KTZ ASM 61 1st 10 rush 0 1
# … with 160 more rows, and 10 more variables: special <dbl>, passer <chr>, rusher <chr>,
# yrdsgained <dbl>, tackle <chr>, complete_pass <chr>, pass_breakup <chr>, punt <dbl>,
# punter <chr>, Play <chr>