Nokogiri:解析、提取和 return <tr> HTML table 中的内容
Nokogiri: parse, extract and return <tr> content in HTML table
我正在尝试解析 HTML table。它基本上是 HTML 中的第六个 <tr>
标记:
<HTML>
<HEAD>
<TITLE>date</TITLE>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
</HEAD>
<BODY bgcolor="white">
<table border=0 cellpadding=0 cellspacing=0>
<tr>
<td align=right colspan=2 id=ptitle name=ptitle>
<font size=3>this is my title</font><br>
</td>
</tr>
<tr>
<td height=10 align=left colspan=2 valign=top>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr>
<td width="50%" align=right><font size=2>this is my subtitle</font></td>
</tr>
</table>
</td>
</tr>
<td valign=top>
<table border=0 cellpadding=0 cellspacing=0>
<tr>
this is a line
</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
</table>
</td>
</tr>
</table>
<br>
</BODY>
</HTML>
我的 Ruby 代码如下所示:
require 'nokogiri'
require 'open-uri'
url = <website-name>
data = Nokogiri::HTML(open(url))
data.at('<tr>').next[6].text
但这行不通。我如何使用 Nokogiri 提取所有这些 <tr>this is a line</tr>
代码?
理想情况下,我希望它位于一个变量中,并像我希望的那样包含 HTML,但它会包含在另一个网站中。
非常感谢!
这样:
data = Nokogiri::HTML(open(url))
rows = data.css("td[valign='top'] table tr") # All the <tr>this is a line</tr>
rows.each do |row|
puts row.text # Will print all the 'this is a line'
end
我正在尝试解析 HTML table。它基本上是 HTML 中的第六个 <tr>
标记:
<HTML>
<HEAD>
<TITLE>date</TITLE>
<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1">
</HEAD>
<BODY bgcolor="white">
<table border=0 cellpadding=0 cellspacing=0>
<tr>
<td align=right colspan=2 id=ptitle name=ptitle>
<font size=3>this is my title</font><br>
</td>
</tr>
<tr>
<td height=10 align=left colspan=2 valign=top>
<table border=0 width="100%" cellpadding=0 cellspacing=0>
<tr>
<td width="50%" align=right><font size=2>this is my subtitle</font></td>
</tr>
</table>
</td>
</tr>
<td valign=top>
<table border=0 cellpadding=0 cellspacing=0>
<tr>
this is a line
</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
<tr>
this is a line</tr>
</table>
</td>
</tr>
</table>
<br>
</BODY>
</HTML>
我的 Ruby 代码如下所示:
require 'nokogiri'
require 'open-uri'
url = <website-name>
data = Nokogiri::HTML(open(url))
data.at('<tr>').next[6].text
但这行不通。我如何使用 Nokogiri 提取所有这些 <tr>this is a line</tr>
代码?
理想情况下,我希望它位于一个变量中,并像我希望的那样包含 HTML,但它会包含在另一个网站中。
非常感谢!
这样:
data = Nokogiri::HTML(open(url))
rows = data.css("td[valign='top'] table tr") # All the <tr>this is a line</tr>
rows.each do |row|
puts row.text # Will print all the 'this is a line'
end