在 ruby 中使用 Mechanize 抓取网页时如何解决 HTTP500 错误?

How do i resolve an HTTP500 Error while web scraping with Mechanize in ruby?

我想从该网站(“https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp”)检索我的驾驶执照号码 issue_date 和 expiry_date。当我尝试获取它时,出现错误 Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.

这是我写的用来抓取的代码:

require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'

OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")  

page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp')  # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click         # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT"  #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last  #submitting my form

它不起作用,因为您在提交表单之前清除了 cookie,因此删除了您提供的所有输入数据。我可以通过简单地删除它来让它工作:

...

page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field

form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
gg = agent.submit form, form.buttons.first

请注意,您无需为#submit 按钮设置值,而是在提交表单时传递提交按钮。