未找到机械化 HTTP 404 Link

Mechanize HTTP Not found 404 Link

我正在使用 Mechanize 创建一个 scraper,它运行 url 的 csv 并下载图像。

问题是一些图像不再存在,我抛出 404 找不到错误。我是 Ruby 的新手,我不知道如何处理异常,希望有人能帮助我。

我离开了我想做的事情

agent = Mechanize.new

url = CSV.read("links.csv")

begin
    url.each do |url|
        puts url
        agent.get(url.first).save
    end
rescue Net::HTTPNotFound  => e
    puts e.response_code 
    agent = e.agent
end  

它给我的错误是:

/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
    from descargaimagenes.rb:34:in `fetch_with_retry'
    from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
    from descargaimagenes.rb:34:in `fetch_with_retry'
    from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'

您可以使用 Mechanize::ResponseCodeError 例外:

This error is raised when Mechanize encounters a response code it does not know how to handle. Currently, this exception will be thrown if Mechanize encounters response codes other than 200, 301, or 302. Any other response code is up to the user to handle.

并在每个块中移动救援,这样你就可以转到 url,保存图像,如果找不到资源,打印响应代码。

[
  'http://www.rockauto.com/Images/whatsnew1.jpg?1512928800',
  'http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg',
  'http://www.rockauto.com/Images/whatsnew2.jpg?1512928800'
].each do |url|
  begin
    agent.get(url).save
  rescue Mechanize::ResponseCodeError => e
    puts e.response_code 
  end
end

有两个工作的url,中间的那个不工作,你应该得到每个工作的url对应的两个图像。