如何跳过错误并在 R 中的 for 循环中

Question

我想通过网络抓取网页列表中图片的 URL。我尝试了以下代码。

library(rvest)

pic_flat = data.frame()

for (i in 7:60){
  # creating a loop for page urls
  link <- paste0("https://www.immobilienscout24.at/regional/wien/wien/wohnung-kaufen/seite-", i)
  page <- read_html(link)
  # scraping href and creating a url
  href <- page %>% html_elements("a.YXjuW") %>% html_attr('href')
  apt_link <- paste0("https://www.immobilienscout24.at",href)
pic_flat = rbind(pic_flat, data.frame(apt_link))
  }

#get the link to the apartment picture
 apt_pic <- data.frame()
 apt <- pic_flat$apt_link

 for(x in apt){   
   picture <- read_html(x) %>% html_element(".CmhTt") %>% html_attr("src")
   apt_pic <- rbind(apt_pic,data.frame(picture))
   }
df_pic <- cbind(pic_flat,data.frame(apt_pic))

但是有些网页在迭代过程中崩溃了。例如：

Error in open.connection(x, "rb") : HTTP error 502.

所以我想跳过这些网页并继续下一个网页并将可用的图片 URL 废弃到我的数据框中。如何使用 tryCatch 函数或任何其他方法来完成此任务？

Answer 1

我们可以创建一个函数，然后使用tryCatch或possibly来跳过错误。

首先创建函数f1获取图片链接，

#function f1
f1 = function(x){
  picture <- x %>% read_html() %>% html_element(".CmhTt") %>% html_attr("src")
}

apt <- pic_flat$apt_link

#now loop by skipping errors
apt_pic = lapply(apt, possibly(f1, NA))

如何跳过错误并在 R 中的 for 循环中

How to skip an error and in a for loop in R

error-handling

r

try-catch

rvest