使用 modify_url 构建多个 URL
Building Multiple URLs using modify_url
我创建了一个页面抓取功能来抓取一些数据。我希望能够创建一个 URL 的列表,以便我可以在函数调用中传递超过 1 个参数以构建不同的 URL。有没有办法使用 httr::modify_url
来做到这一点?
我创建单个 URL 的代码如下:
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
library(httr)
# Arguments for Function
hand = NULL
prp = "P"
month = NULL
year = 2019
pitch_type = "FA"
report_type = "pfx"
lim = 0
url <- httr::modify_url("https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php",
query = list(
hand = hand,
reportType = report_type,
prp = prp,
month = month,
year = year,
pitch = pitch_type,
ds = "velo",
lim = lim
))
# Single Query Result
url
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
我想知道是否可以使用上面的 httr::modify_url
查询和 purrr::reduce(paste0)
的某种组合来创建 URL 的附加参数:
# Requested Query
pitch_type = c("FA", "SI")
report_type = c("pfx", "outcome")
# URL Generating Function for User inputs
generate_urls <- function(hand = NULL, report_type = c("pfx", "outcome"), prp = "P", month = NULL, year = NULL, pitch_type = c("FA", "SI"), lim = 0) {
# Not sure of what to put in function for modify_url call
}
# Result
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
这是一个使用 tidyverse 函数的选项。首先,我们可以定义我们要遍历的参数space
params <- list(
hand = NULL,
prp = "P",
year = 2019,
month = NULL,
pitch_type = c("FA", "SI"),
report_type = c("pfx", "outcome"),
lim = 0
)
然后我们可以得到所有的网址
library(tidyverse) # tidyr for crossing(); purrr for pmap(), map_chr()
library(httr)
baseurl <- "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php"
crossing(!!!params) %>%
pmap(list) %>%
map_chr( ~modify_url(baseurl, query=.x) )
crossing()
负责获取所有可能的参数组合。 pmap(list)
然后将 tibble 的每一行变成它们自己的列表(这是我们需要传递给 [=15= 的 query=
参数的内容)。最后我们调用 url每组参数生成函数和return一个字符串
我创建了一个页面抓取功能来抓取一些数据。我希望能够创建一个 URL 的列表,以便我可以在函数调用中传递超过 1 个参数以构建不同的 URL。有没有办法使用 httr::modify_url
来做到这一点?
我创建单个 URL 的代码如下:
library(tidyverse)
#> Registered S3 methods overwritten by 'ggplot2':
#> method from
#> [.quosures rlang
#> c.quosures rlang
#> print.quosures rlang
library(httr)
# Arguments for Function
hand = NULL
prp = "P"
month = NULL
year = 2019
pitch_type = "FA"
report_type = "pfx"
lim = 0
url <- httr::modify_url("https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php",
query = list(
hand = hand,
reportType = report_type,
prp = prp,
month = month,
year = year,
pitch = pitch_type,
ds = "velo",
lim = lim
))
# Single Query Result
url
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
我想知道是否可以使用上面的 httr::modify_url
查询和 purrr::reduce(paste0)
的某种组合来创建 URL 的附加参数:
# Requested Query
pitch_type = c("FA", "SI")
report_type = c("pfx", "outcome")
# URL Generating Function for User inputs
generate_urls <- function(hand = NULL, report_type = c("pfx", "outcome"), prp = "P", month = NULL, year = NULL, pitch_type = c("FA", "SI"), lim = 0) {
# Not sure of what to put in function for modify_url call
}
# Result
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=pfx&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=FA&ds=velo&lim=0"
"https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
#> [1] "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php?reportType=outcome&prp=P&year=2019&pitch=SI&ds=velo&lim=0"
这是一个使用 tidyverse 函数的选项。首先,我们可以定义我们要遍历的参数space
params <- list(
hand = NULL,
prp = "P",
year = 2019,
month = NULL,
pitch_type = c("FA", "SI"),
report_type = c("pfx", "outcome"),
lim = 0
)
然后我们可以得到所有的网址
library(tidyverse) # tidyr for crossing(); purrr for pmap(), map_chr()
library(httr)
baseurl <- "https://legacy.baseballprospectus.com/pitchfx/leaderboards/index.php"
crossing(!!!params) %>%
pmap(list) %>%
map_chr( ~modify_url(baseurl, query=.x) )
crossing()
负责获取所有可能的参数组合。 pmap(list)
然后将 tibble 的每一行变成它们自己的列表(这是我们需要传递给 [=15= 的 query=
参数的内容)。最后我们调用 url每组参数生成函数和return一个字符串