如何使用 Mechanize 或 Nokogiri 从字符串中解析表单
How to parse forms with Mechanize or Nokogiri from a string
我需要解析表单以获取 `IW_SessionID_from 的值 HTML 我回来了,但我无法开始工作。
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv=\"cache-control\" content=\"no-cache\">\r\n<meta http-equiv=\"pragma\" content=\"no-cache\">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements[\"IW_width\"].value = lWidth;\r\ndocument.forms[0].elements[\"IW_height\"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload=\"Initialize()\">\r\n<form method=post action=\"/bwtem\">\r\n<input type=hidden name=\"IW_width\">\r\n<input type=hidden name=\"IW_height\">\r\n<input type=hidden name=\"IW_SessionID_\" value=\"1wqzj1f0vec57r1apfqg51wzs88c\">\r\n<input type=hidden name=\"IW_TrackID_\" value=\"0\">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
puts page.css('form[name="IW_SessionID_"]')
a = Mechanize.new
page2 = Mechanize::Page.new(nil,{'content-type'=>'text/html'},r,nil,a)
pp page2.form_with(:name => "IW_SessionID_")
脚本只是 returns nil
.
谁能知道如何获得 IW_SessionID_
的值?
您必须取消转义示例 HTML 字符串,然后使用名称 IW_SessionID_
.
搜索 input
这个示例代码适合我:
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv="cache-control" content="no-cache">\r\n<meta http-equiv="pragma" content="no-cache">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements["IW_width"].value = lWidth;\r\ndocument.forms[0].elements["IW_height"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload="Initialize()">\r\n<form method=post action="/bwtem">\r\n<input type=hidden name="IW_width">\r\n<input type=hidden name="IW_height">\r\n<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">\r\n<input type=hidden name="IW_TrackID_" value="0">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
input = page.css('input[name="IW_SessionID_"]').first
puts input[:value]
熟悉工具后,操作起来很容易:
require 'nokogiri'
doc = Nokogiri::HTML(DATA.read)
doc.at('input[name="IW_SessionID_"]')['value']
# => "1wqzj1f0vec57r1apfqg51wzs88c"
__END__
<HTML>
<BODY>
<form method=post action="/bwtem">
<input type=hidden name="IW_height">
<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">
<input type=hidden name="IW_TrackID_" value="0">
</form>
</BODY>
</HTML>
不要做这样的事情:
page.css('form[name="IW_SessionID_"]')
css
用于搜索匹配选择器的多个元素。一个表单不太可能有多个同名的隐藏输入,所以 at
会更明智。 css
returns 一个 NodeSet,类似于节点数组,因此不像节点:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>foo</p>
<p>bar</p>
</body>
</html>
EOT
doc.search('p').class # => Nokogiri::XML::NodeSet
doc.at('p').class # => Nokogiri::XML::Element
text
将连接 NodeSet 中的文本元素,导致一团糟:
doc.search('p').text # => "foobar"
而使用 map(&:text)
将遍历返回其文本的节点:
doc.search('p').map(&:text) # => ["foo", "bar"]
另请注意 css(...).first
或 search(...).first
与 at
或其 at_*
兄弟姐妹之一相同:
doc.search('p').first.to_html # => "<p>foo</p>"
doc.at('p').to_html # => "<p>foo</p>"
因此,为了清楚起见,请使用 at
而不是 search(...).first
。
最后,将您的 HTML 示例剥离到演示您所询问的问题所需的最低限度。超出此范围的任何事情都会浪费 space 和我们试图理解问题的时间。
我需要解析表单以获取 `IW_SessionID_from 的值 HTML 我回来了,但我无法开始工作。
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv=\"cache-control\" content=\"no-cache\">\r\n<meta http-equiv=\"pragma\" content=\"no-cache\">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements[\"IW_width\"].value = lWidth;\r\ndocument.forms[0].elements[\"IW_height\"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload=\"Initialize()\">\r\n<form method=post action=\"/bwtem\">\r\n<input type=hidden name=\"IW_width\">\r\n<input type=hidden name=\"IW_height\">\r\n<input type=hidden name=\"IW_SessionID_\" value=\"1wqzj1f0vec57r1apfqg51wzs88c\">\r\n<input type=hidden name=\"IW_TrackID_\" value=\"0\">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
puts page.css('form[name="IW_SessionID_"]')
a = Mechanize.new
page2 = Mechanize::Page.new(nil,{'content-type'=>'text/html'},r,nil,a)
pp page2.form_with(:name => "IW_SessionID_")
脚本只是 returns nil
.
谁能知道如何获得 IW_SessionID_
的值?
您必须取消转义示例 HTML 字符串,然后使用名称 IW_SessionID_
.
这个示例代码适合我:
#!/usr/bin/ruby
require 'pp'
require 'nokogiri'
require 'mechanize'
r = '<HTML><HEAD><TITLE></TITLE><meta http-equiv="cache-control" content="no-cache">\r\n<meta http-equiv="pragma" content="no-cache">\r\n<NOSCRIPT><HTML><BODY>Your browser does not seem to support JavaScript. Please make sure it is supported and activated</BODY></HTML></NOSCRIPT>\r\n<SCRIPT>\r\nvar ie4 = (document.all)? true:false;\r\nvar ns6 = (document.getElementById)? true && !ie4:false;\r\nfunction Initialize() {\r\nvar lWidth;\r\nvar lHeight;\r\nif (ns6) {\r\n lWidth = window.innerWidth - 30;\r\n lHeight = window.innerHeight - 30;\r\n} else {\r\n lWidth = document.body.clientWidth;\r\n lHeight = document.body.clientHeight;\r\n if (lWidth == 0) { lWidth = undefined;}\r\n if (lHeight == 0) { lHeight = undefined;}\r\n}\r\ndocument.forms[0].elements["IW_width"].value = lWidth;\r\ndocument.forms[0].elements["IW_height"].value = lHeight;\r\ndocument.forms[0].submit();\r\n}</SCRIPT></HEAD><BODY onload="Initialize()">\r\n<form method=post action="/bwtem">\r\n<input type=hidden name="IW_width">\r\n<input type=hidden name="IW_height">\r\n<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">\r\n<input type=hidden name="IW_TrackID_" value="0">\r\n</form></BODY></HTML>'
page = Nokogiri::HTML r
input = page.css('input[name="IW_SessionID_"]').first
puts input[:value]
熟悉工具后,操作起来很容易:
require 'nokogiri'
doc = Nokogiri::HTML(DATA.read)
doc.at('input[name="IW_SessionID_"]')['value']
# => "1wqzj1f0vec57r1apfqg51wzs88c"
__END__
<HTML>
<BODY>
<form method=post action="/bwtem">
<input type=hidden name="IW_height">
<input type=hidden name="IW_SessionID_" value="1wqzj1f0vec57r1apfqg51wzs88c">
<input type=hidden name="IW_TrackID_" value="0">
</form>
</BODY>
</HTML>
不要做这样的事情:
page.css('form[name="IW_SessionID_"]')
css
用于搜索匹配选择器的多个元素。一个表单不太可能有多个同名的隐藏输入,所以 at
会更明智。 css
returns 一个 NodeSet,类似于节点数组,因此不像节点:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<html>
<body>
<p>foo</p>
<p>bar</p>
</body>
</html>
EOT
doc.search('p').class # => Nokogiri::XML::NodeSet
doc.at('p').class # => Nokogiri::XML::Element
text
将连接 NodeSet 中的文本元素,导致一团糟:
doc.search('p').text # => "foobar"
而使用 map(&:text)
将遍历返回其文本的节点:
doc.search('p').map(&:text) # => ["foo", "bar"]
另请注意 css(...).first
或 search(...).first
与 at
或其 at_*
兄弟姐妹之一相同:
doc.search('p').first.to_html # => "<p>foo</p>"
doc.at('p').to_html # => "<p>foo</p>"
因此,为了清楚起见,请使用 at
而不是 search(...).first
。
最后,将您的 HTML 示例剥离到演示您所询问的问题所需的最低限度。超出此范围的任何事情都会浪费 space 和我们试图理解问题的时间。