在 R 中迭代 API GET 日期和组合数据集
Iterating API GET over dates and combing data sets in R
我正在尝试创建一个可以在指定时间跨度(例如过去 30 天或过去 90 天)内迭代的函数。每次拉取限制为 2,500 条记录,因此我可能需要一次拉取 1 天或 1 周,具体取决于我的参数。
我已经查看了 此处的内容,但无法完全按照我的要求进行操作。我创建了一个 while()
函数,它生成一个 URLs:
的向量
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
while(the_date <= end_date)
{
api <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
the_date <- the_date + 1
as.character(api)
print(api)
}
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-27^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-28^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-29^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-30^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
这是我卡住的地方。我想创建一个函数来遍历每个 URL,然后合并数据。
当我执行单拉时,我使用以下内容:
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
api_df <- xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr")
创建其中的 30 个肯定不是最有效的方法...希望在这方面得到一些帮助。
假设您的语句和 api/webpage 的解析是正确的,这个脚本应该可以工作。
详情见评论:
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
#create an empty list
output<-list()
while(the_date <= end_date)
{
#Track which date is being pulled - handy for debugging when script errors
print(the_date)
url <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
#Append dataframe to list - item named by date
output[[as.character(the_date)]]<-xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr"))
#slight system pause to prevent attacking the server
Sys.sleep(0.7)
the_date <- the_date + 1
}
#combine all of the dataframes in the output list into one large data frame
alloutput<-do.call(rbind, output)
我正在尝试创建一个可以在指定时间跨度(例如过去 30 天或过去 90 天)内迭代的函数。每次拉取限制为 2,500 条记录,因此我可能需要一次拉取 1 天或 1 周,具体取决于我的参数。
我已经查看了 while()
函数,它生成一个 URLs:
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
while(the_date <= end_date)
{
api <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
the_date <- the_date + 1
as.character(api)
print(api)
}
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-27^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-28^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-29^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
[1] "https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=2020-01-30^2020-01-30&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col..."
这是我卡住的地方。我想创建一个函数来遍历每个 URL,然后合并数据。
当我执行单拉时,我使用以下内容:
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
api_df <- xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr")
创建其中的 30 个肯定不是最有效的方法...希望在这方面得到一些帮助。
假设您的语句和 api/webpage 的解析是正确的,这个脚本应该可以工作。
详情见评论:
end_date <- Sys.Date()
start_date <- as.Date("2020-01-27", format = "%Y-%m-%d")
the_date <- start_date
#create an empty list
output<-list()
while(the_date <= end_date)
{
#Track which date is being pulled - handy for debugging when script errors
print(the_date)
url <- paste0("https://website.com/api/?action=search_pcrs&e1=eTimes.03&o1=between&v1=",
the_date,
"^",
end_date,
"&e2=eMedications.03&o2=in&v2type=id&v2=14731^14730^14729^3864&col...")
api_get <- GET(url)
api_raw <- rawToChar(api_get$content)
api_tree <- xmlTreeParse(api_raw, useInternalNodes = T)
#Append dataframe to list - item named by date
output[[as.character(the_date)]]<-xmlToDataFrame(api_tree, nodes = getNodeSet(api_tree, "//pcr"))
#slight system pause to prevent attacking the server
Sys.sleep(0.7)
the_date <- the_date + 1
}
#combine all of the dataframes in the output list into one large data frame
alloutput<-do.call(rbind, output)