未找到机械化 HTTP 404 Link
Mechanize HTTP Not found 404 Link
我正在使用 Mechanize 创建一个 scraper,它运行 url 的 csv 并下载图像。
问题是一些图像不再存在,我抛出 404 找不到错误。我是 Ruby 的新手,我不知道如何处理异常,希望有人能帮助我。
我离开了我想做的事情
agent = Mechanize.new
url = CSV.read("links.csv")
begin
url.each do |url|
puts url
agent.get(url.first).save
end
rescue Net::HTTPNotFound => e
puts e.response_code
agent = e.agent
end
它给我的错误是:
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
您可以使用 Mechanize::ResponseCodeError
例外:
This error is raised when Mechanize encounters a response code it does
not know how to handle. Currently, this exception will be thrown if
Mechanize encounters response codes other than 200, 301, or 302. Any
other response code is up to the user to handle.
并在每个块中移动救援,这样你就可以转到 url,保存图像,如果找不到资源,打印响应代码。
[
'http://www.rockauto.com/Images/whatsnew1.jpg?1512928800',
'http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg',
'http://www.rockauto.com/Images/whatsnew2.jpg?1512928800'
].each do |url|
begin
agent.get(url).save
rescue Mechanize::ResponseCodeError => e
puts e.response_code
end
end
有两个工作的url,中间的那个不工作,你应该得到每个工作的url对应的两个图像。
我正在使用 Mechanize 创建一个 scraper,它运行 url 的 csv 并下载图像。
问题是一些图像不再存在,我抛出 404 找不到错误。我是 Ruby 的新手,我不知道如何处理异常,希望有人能帮助我。
我离开了我想做的事情
agent = Mechanize.new
url = CSV.read("links.csv")
begin
url.each do |url|
puts url
agent.get(url.first).save
end
rescue Net::HTTPNotFound => e
puts e.response_code
agent = e.agent
end
它给我的错误是:
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
/home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize/http/agent.rb:323:in `fetch': 404 => Net::HTTPNotFound for http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg -- unhandled response (Mechanize::ResponseCodeError)
from descargaimagenes.rb:34:in `fetch_with_retry'
from /home/miguel/.rbenv/versions/2.4.2/lib/ruby/gems/2.4.0/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
您可以使用 Mechanize::ResponseCodeError
例外:
This error is raised when Mechanize encounters a response code it does not know how to handle. Currently, this exception will be thrown if Mechanize encounters response codes other than 200, 301, or 302. Any other response code is up to the user to handle.
并在每个块中移动救援,这样你就可以转到 url,保存图像,如果找不到资源,打印响应代码。
[
'http://www.rockauto.com/Images/whatsnew1.jpg?1512928800',
'http://www.rockauto.com/info/915/FCA6366_Fronp__ra_p.jpg',
'http://www.rockauto.com/Images/whatsnew2.jpg?1512928800'
].each do |url|
begin
agent.get(url).save
rescue Mechanize::ResponseCodeError => e
puts e.response_code
end
end
有两个工作的url,中间的那个不工作,你应该得到每个工作的url对应的两个图像。