使用 R 从网页下载所有文件(.zip 和 .txt)
Download all the files (.zip and .txt) from a webpage using R
我正在尝试从网站下载所有文件(.zip 和 .txt 文件),但我似乎找不到方法。我尝试了 and this 的建议,但没有成功。
网站
https://pubs.usgs.gov/sir/2007/5107/downloads/
(我需要为几个类似的 USGS 页面执行此操作,因此无法手动执行)
在这里试试这个。它之所以有效,是因为文件 url 具有可重复的模式。从网页中获取文件名有点笨拙,但它似乎确实有效。
许多文本文件可能缺少行尾标记(这很常见)并可能引发错误。但是,这可能不是一个重要的错误。如果发生这种情况,请打开下载的 txt 文件以确保下载正确。毫无疑问,有一种方法可以使该步骤自动化,但我没时间做这一步,伙计(或 Dudette 或任何你喜欢的)。
#get homepage for locations
page <- "https://pubs.usgs.gov/sir/2007/5107/downloads/"
a <- readLines(page)
#find lines of interest
loc.txt <- grep(".txt", a)
loc.zip <- grep(".zip", a)
#A convenience function that uses
#line from original page
#marker of file type to locate name
#and page (url original page)
#------------------------------------
convfn <- function(line, marker, page){
i <- unlist(gregexpr(pattern ='href="', line)) + 6
i2<- unlist(gregexpr(pattern =,marker, line)) + 3
#target file
.destfile <- substring(line, i[1], i2[1])
#target url
.url <- paste(page, .destfile, sep = "/")
#print targets
cat(.url, '\n', .destfile, '\n')
#the workhorse function
download.file(url=.url, destfile=.destfile)
}
#--------------------------------------------
#they will save in your working directory
#use setwd() to change if needed
print(getwd())
#get the .txt files and download them
sapply(a[loc.txt],
FUN = convfn,
marker = '.txt"', #this is key part, locates text file name
page = page)
#get the .zip files and download them
sapply(a[loc.zip],
FUN = convfn,
marker = '.zip"', #this is key part, locates zip file name
page = page)
我正在尝试从网站下载所有文件(.zip 和 .txt 文件),但我似乎找不到方法。我尝试了
网站 https://pubs.usgs.gov/sir/2007/5107/downloads/
(我需要为几个类似的 USGS 页面执行此操作,因此无法手动执行)
在这里试试这个。它之所以有效,是因为文件 url 具有可重复的模式。从网页中获取文件名有点笨拙,但它似乎确实有效。
许多文本文件可能缺少行尾标记(这很常见)并可能引发错误。但是,这可能不是一个重要的错误。如果发生这种情况,请打开下载的 txt 文件以确保下载正确。毫无疑问,有一种方法可以使该步骤自动化,但我没时间做这一步,伙计(或 Dudette 或任何你喜欢的)。
#get homepage for locations
page <- "https://pubs.usgs.gov/sir/2007/5107/downloads/"
a <- readLines(page)
#find lines of interest
loc.txt <- grep(".txt", a)
loc.zip <- grep(".zip", a)
#A convenience function that uses
#line from original page
#marker of file type to locate name
#and page (url original page)
#------------------------------------
convfn <- function(line, marker, page){
i <- unlist(gregexpr(pattern ='href="', line)) + 6
i2<- unlist(gregexpr(pattern =,marker, line)) + 3
#target file
.destfile <- substring(line, i[1], i2[1])
#target url
.url <- paste(page, .destfile, sep = "/")
#print targets
cat(.url, '\n', .destfile, '\n')
#the workhorse function
download.file(url=.url, destfile=.destfile)
}
#--------------------------------------------
#they will save in your working directory
#use setwd() to change if needed
print(getwd())
#get the .txt files and download them
sapply(a[loc.txt],
FUN = convfn,
marker = '.txt"', #this is key part, locates text file name
page = page)
#get the .zip files and download them
sapply(a[loc.zip],
FUN = convfn,
marker = '.zip"', #this is key part, locates zip file name
page = page)