如何在 Ruby 中使用 Nokogiri 解析日期
How to parse a date using Nokogiri in Ruby
我正在尝试解析此页面并提取
之后开始的日期
>p>From Date:
我收到错误
Invalid predicate: //b[text() = '<p>From Date: ' (Nokogiri::XML::XPath::SyntaxError)
来自 "inspect element" 的 xpath 是
/html/body/div#timelineItems/table/tbody/tr/td/table.resultsTypes/tbody/tr/td/p
这是代码示例:
#/usr/bin/ruby
require 'Nokogiri'
noko = Nokogiri::HTML('china.html')
noko.xpath("//b[text() = '<p>From Date: ").each do |b|
puts b.next_sibling.content.strip
end
这是文件://china.html
<div class="snippet" data-lang="js" data-hide="false">
<div class="snippet-code">
<pre><code> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>File </title>
</head>
<body>
<div id ="timelineItems">
<H2 id="telegram1"> Title </H2>
<p><table cellspacing="0">
<tr>
<td width="2%"> </td>
<td width="75%">
<table cellspacing="0" cellpadding="0" class="resultsTypes">
<tr>
<td width="5%" class="hide"> </td>
<td width="70%">
<p>Template: <span class="bidi">ארכיון בן גוריון - מסמך</span></p>
<p>Title: <a href="http://www.bing.com" title=""><span class="bidi">Meeting in China</span></a></p>
<p>recipient: David Ben Gurion</p>
<p>sender: Prime Minister of Union of Burma, Rangoon</p>
<p> Sub collection: <span class="bidi">התכתבות > תת-חטיבה מכתב</span></p>
<p>From Date: 02/14/1936</p>
<p>Link to file: <span class="bidi">תיק התכתבות 1956 ינואר</span></p>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
</table></td>
<td class="actions"> </td>
</tr>
</table>
</p>
</div>
</body></html>
我正在尝试解析此页面并提取
之后开始的日期>p>From Date:
我收到错误
Invalid predicate: //b[text() = '<p>From Date: ' (Nokogiri::XML::XPath::SyntaxError)
来自 "inspect element" 的 xpath 是
/html/body/div#timelineItems/table/tbody/tr/td/table.resultsTypes/tbody/tr/td/p
这是代码示例:
#/usr/bin/ruby
require 'Nokogiri'
noko = Nokogiri::HTML('china.html')
noko.xpath("//b[text() = '<p>From Date: ").each do |b|
puts b.next_sibling.content.strip
end
这是文件://china.html
<div class="snippet" data-lang="js" data-hide="false">
<div class="snippet-code">
<pre><code> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<html lang="en"><head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>File </title>
</head>
<body>
<div id ="timelineItems">
<H2 id="telegram1"> Title </H2>
<p><table cellspacing="0">
<tr>
<td width="2%"> </td>
<td width="75%">
<table cellspacing="0" cellpadding="0" class="resultsTypes">
<tr>
<td width="5%" class="hide"> </td>
<td width="70%">
<p>Template: <span class="bidi">ארכיון בן גוריון - מסמך</span></p>
<p>Title: <a href="http://www.bing.com" title=""><span class="bidi">Meeting in China</span></a></p>
<p>recipient: David Ben Gurion</p>
<p>sender: Prime Minister of Union of Burma, Rangoon</p>
<p> Sub collection: <span class="bidi">התכתבות > תת-חטיבה מכתב</span></p>
<p>From Date: 02/14/1936</p>
<p>Link to file: <span class="bidi">תיק התכתבות 1956 ינואר</span></p>
</td>
</tr>
<tr>
<td colspan="2">
</td>
</tr>
</table></td>
<td class="actions"> </td>
</tr>
</table>
</p>
</div>
</body></html>