使用 R 从网页下载所有文件（.zip 和 .txt）

Question

我正在尝试从网站下载所有文件（.zip 和 .txt 文件），但我似乎找不到方法。我尝试了 and this 的建议，但没有成功。

网站 https://pubs.usgs.gov/sir/2007/5107/downloads/

（我需要为几个类似的 USGS 页面执行此操作，因此无法手动执行）

Answer 1

在这里试试这个。它之所以有效，是因为文件 url 具有可重复的模式。从网页中获取文件名有点笨拙，但它似乎确实有效。

许多文本文件可能缺少行尾标记（这很常见）并可能引发错误。但是，这可能不是一个重要的错误。如果发生这种情况，请打开下载的 txt 文件以确保下载正确。毫无疑问，有一种方法可以使该步骤自动化，但我没时间做这一步，伙计（或 Dudette 或任何你喜欢的）。

#get homepage for locations
page <- "https://pubs.usgs.gov/sir/2007/5107/downloads/"
a <- readLines(page)

#find lines of interest
loc.txt <- grep(".txt", a)
loc.zip <- grep(".zip", a)

#A convenience function that uses
#line from original page
#marker of file type to locate name 
#and page (url original page)
#------------------------------------
convfn <- function(line, marker, page){
  i <- unlist(gregexpr(pattern ='href="', line)) + 6
  i2<- unlist(gregexpr(pattern =,marker,  line)) + 3
  #target file
  .destfile <- substring(line, i[1], i2[1])
  #target url
  .url      <- paste(page, .destfile, sep = "/")
  #print targets
  cat(.url, '\n', .destfile, '\n')
  #the workhorse function
  download.file(url=.url, destfile=.destfile)
  }
#--------------------------------------------

#they will save in your working directory
#use setwd() to change if needed
print(getwd())
  
#get the .txt files and download them
sapply(a[loc.txt], 
       FUN = convfn, 
       marker = '.txt"', #this is key part, locates text file name
       page = page)

#get the .zip files and download them
sapply(a[loc.zip], 
       FUN = convfn, 
       marker = '.zip"', #this is key part, locates zip file name
       page = page)

使用 R 从网页下载所有文件（.zip 和 .txt）

Download all the files (.zip and .txt) from a webpage using R

html

r

download

web-scraping