在 Ruby 中使用 Nokogiri 解析 HTML table
Parse an HTML table with Nokogiri in Ruby
我有一个 HTML table,如下所示:
<table id="TTdata" border="0" cellspacing="0" cellpadding="3" align="center">
<tbody>
<tr class="TTdata_ltblue">
<td class="ctr"><b>#</b></td>
<td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=YEAR">YEAR</a><img src="/images/up.gif"></b></td>
<td class="ctr" title="Player's name."><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=NAME">NAME</a></b></td>
<td class="ctr" title="how many pitches a catcher had a chance/need to frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_CHANCES">FR_CHANCES</a></b></td>
<td class="ctr" title="the number of strikes the catcher is expected to have received according to RPM"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PREDICTED_STRIKES">PREDICTED_STRIKES</a></b></td>
<td class="ctr" title="the number of strikes the catcher actually received"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=ACTUAL_STRIKES">ACTUAL_STRIKES</a></b></td>
<td class="ctr" title="the difference between actual and predicted strikes received by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=EXTRA_STRIKES">EXTRA_STRIKES</a></b></td>
<td class="ctr" title="runs RPM credits to the catcher, using the ball-strike context to calculated run value"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_RUNS_ADDED_BY_COUNT">FR_RUNS_ADDED_BY_COUNT</a><img src="/images/down.gif"></b></td>
<td class="ctr" title="how many runs RPM would assign using a generic .14 runs available per frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_RUNS_ADDED_BY_CALL">FR_RUNS_ADDED_BY_CALL</a></b></td>
<td class="ctr" title="pitches the catcher received that could have resulted in a wild pitch or passed ball; this is when runners are on base or a dropped third strike is possible"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=BL_CHANCES">BL_CHANCES</a></b></td>
<td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PREDICTED_PBWP">PREDICTED_PBWP</a></b></td>
<td class="ctr" title="the run value accumulated from preventing wild pitches and passed balls (.28 per PB/WP saved)"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=BL_RUNS_ADDED">BL_RUNS_ADDED</a></b></td>
<td class="ctr" title="the number of passed balls and wild pitches allowed by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=ACTUAL_PBWP">ACTUAL_PBWP</a></b></td>
<td class="ctr" title="the difference between actual and predicted passed balls and wild pitches allowed by the catcher
"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PBWP_SAVED">PBWP_SAVED</a></b></td>
</tr>
<tr class="TTdata">
<td>1.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Yasmani+Grandal" target="_blank">Yasmani Grandal</a></td>
<td class="right">2295</td>
<td class="right">871.5</td>
<td class="right">925</td>
<td class="right">53.5</td>
<td class="right">8.0</td>
<td class="right">8.0</td>
<td class="right">1097</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>2.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Buster+Posey" target="_blank">Buster Posey</a></td>
<td class="right">2601</td>
<td class="right">1,011.4</td>
<td class="right">1,056</td>
<td class="right">44.6</td>
<td class="right">6.6</td>
<td class="right">6.6</td>
<td class="right">1232</td>
<td class="right">10.0</td>
<td class="right">0.0</td>
<td class="right">10</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>3.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Francisco+Cervelli" target="_blank">Francisco Cervelli</a></td>
<td class="right">2629</td>
<td class="right">989.0</td>
<td class="right">1,033</td>
<td class="right">44.0</td>
<td class="right">6.5</td>
<td class="right">6.5</td>
<td class="right">1357</td>
<td class="right">14.0</td>
<td class="right">0.0</td>
<td class="right">14</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>4.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Mike+Zunino" target="_blank">Mike Zunino</a></td>
<td class="right">2828</td>
<td class="right">1,128.8</td>
<td class="right">1,169</td>
<td class="right">40.2</td>
<td class="right">6.0</td>
<td class="right">6.0</td>
<td class="right">1325</td>
<td class="right">19.0</td>
<td class="right">0.0</td>
<td class="right">19</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>5.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Caleb+Joseph" target="_blank">Caleb Joseph</a></td>
<td class="right">2713</td>
<td class="right">993.9</td>
<td class="right">1,031</td>
<td class="right">37.1</td>
<td class="right">5.5</td>
<td class="right">5.5</td>
<td class="right">1315</td>
<td class="right">9.0</td>
<td class="right">0.0</td>
<td class="right">9</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>6.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Chris+Iannetta" target="_blank">Chris Iannetta</a></td>
<td class="right">2158</td>
<td class="right">847.5</td>
<td class="right">884</td>
<td class="right">36.5</td>
<td class="right">5.4</td>
<td class="right">5.4</td>
<td class="right">1078</td>
<td class="right">15.0</td>
<td class="right">0.0</td>
<td class="right">15</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>7.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Jason+Castro" target="_blank">Jason Castro</a></td>
<td class="right">2679</td>
<td class="right">1,068.9</td>
<td class="right">1,105</td>
<td class="right">36.1</td>
<td class="right">5.4</td>
<td class="right">5.4</td>
<td class="right">1378</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>8.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Miguel+Montero" target="_blank">Miguel Montero</a></td>
<td class="right">1977</td>
<td class="right">785.8</td>
<td class="right">820</td>
<td class="right">34.2</td>
<td class="right">5.1</td>
<td class="right">5.1</td>
<td class="right">972</td>
<td class="right">11.0</td>
<td class="right">0.0</td>
<td class="right">11</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>9.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Martin+Maldonado" target="_blank">Martin Maldonado</a></td>
<td class="right">2343</td>
<td class="right">906.0</td>
<td class="right">940</td>
<td class="right">34.0</td>
<td class="right">5.1</td>
<td class="right">5.1</td>
<td class="right">1193</td>
<td class="right">17.0</td>
<td class="right">0.0</td>
<td class="right">17</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>10.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Tyler+Flowers" target="_blank">Tyler Flowers</a></td>
<td class="right">2191</td>
<td class="right">833.4</td>
<td class="right">865</td>
<td class="right">31.6</td>
<td class="right">4.7</td>
<td class="right">4.7</td>
<td class="right">1305</td>
<td class="right">13.0</td>
<td class="right">0.0</td>
<td class="right">13</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>11.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Rene+Rivera" target="_blank">Rene Rivera</a></td>
<td class="right">2632</td>
<td class="right">1,043.1</td>
<td class="right">1,070</td>
<td class="right">26.9</td>
<td class="right">4.0</td>
<td class="right">4.0</td>
<td class="right">1331</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>12.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Russell+Martin" target="_blank">Russell Martin</a></td>
<td class="right">2919</td>
<td class="right">1,121.3</td>
<td class="right">1,148</td>
<td class="right">26.7</td>
<td class="right">4.0</td>
<td class="right">4.0</td>
<td class="right">1470</td>
<td class="right">27.0</td>
<td class="right">0.0</td>
<td class="right">27</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>13.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Kevin+Plawecki" target="_blank">Kevin Plawecki</a></td>
<td class="right">1826</td>
<td class="right">744.0</td>
<td class="right">770</td>
<td class="right">26.0</td>
<td class="right">3.9</td>
<td class="right">3.9</td>
<td class="right">886</td>
<td class="right">9.0</td>
<td class="right">0.0</td>
<td class="right">9</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>14.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=David+Ross" target="_blank">David Ross</a></td>
<td class="right">941</td>
<td class="right">339.6</td>
<td class="right">361</td>
<td class="right">21.4</td>
<td class="right">3.2</td>
<td class="right">3.2</td>
<td class="right">519</td>
<td class="right">5.0</td>
<td class="right">0.0</td>
<td class="right">5</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>15.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Roberto+Perez" target="_blank">Roberto Perez</a></td>
<td class="right">1969</td>
<td class="right">776.5</td>
<td class="right">789</td>
<td class="right">12.5</td>
<td class="right">1.9</td>
<td class="right">1.9</td>
<td class="right">1090</td>
<td class="right">12.0</td>
<td class="right">0.0</td>
<td class="right">12</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>16.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Welington+Castillo" target="_blank">Welington Castillo</a></td>
<td class="right">1047</td>
<td class="right">410.6</td>
<td class="right">420</td>
<td class="right">9.4</td>
<td class="right">1.4</td>
<td class="right">1.4</td>
<td class="right">499</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>17.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Hank+Conger" target="_blank">Hank Conger</a></td>
<td class="right">1000</td>
<td class="right">405.2</td>
<td class="right">414</td>
<td class="right">8.8</td>
<td class="right">1.3</td>
<td class="right">1.3</td>
<td class="right">511</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>18.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Josh+Thole" target="_blank">Josh Thole</a></td>
<td class="right">476</td>
<td class="right">168.8</td>
<td class="right">177</td>
<td class="right">8.2</td>
<td class="right">1.2</td>
<td class="right">1.2</td>
<td class="right">275</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>19.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Tucker+Barnhart" target="_blank">Tucker Barnhart</a></td>
<td class="right">934</td>
<td class="right">351.4</td>
<td class="right">357</td>
<td class="right">5.6</td>
<td class="right">0.8</td>
<td class="right">0.8</td>
<td class="right">410</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
</tbody>
</table>
在这种情况下,我有兴趣检索 table 行中的每个 "player",其中 class 或 TTdata
或 TTdata_ltgrey
.这可以使用以下方法实现:
html = open(url)
doc = Nokogiri::HTML(html)
doc.css('.TTdata, .TTdata_lgrey').each do |catcher|
# parse here
end
我的问题是,td
个条目中的 none 个条目有 class 个与之关联。我只知道TD 1是位置,TD 2是年份,TD 3是名字。
使用上面的迭代访问每个 td
以便我可以为每一行创建 model/hash 对 name/val 对的正确方法是什么?
这是我尝试过的一种方法。但是,是的,您可以从这里更进一步以满足您的需求:
require 'nokogiri'
require 'pp'
doc = Nokogiri::HTML.parse(File.read("#{__dir__}/out1.html"))
data = doc.css('.TTdata, .TTdata_lgrey').map do |tr|
%i(position year name).zip(tr.css("td:nth-child(-n+3)").map(&:text)).to_h
end
pp data
输出
[{:position=>"1.", :year=>"2015", :name=>"Yasmani Grandal"},
{:position=>"3.", :year=>"2015", :name=>"Francisco Cervelli"},
{:position=>"5.", :year=>"2015", :name=>"Caleb Joseph"},
{:position=>"7.", :year=>"2015", :name=>"Jason Castro"},
{:position=>"9.", :year=>"2015", :name=>"Martin Maldonado"},
{:position=>"11.", :year=>"2015", :name=>"Rene Rivera"},
{:position=>"13.", :year=>"2015", :name=>"Kevin Plawecki"},
{:position=>"15.", :year=>"2015", :name=>"Roberto Perez"},
{:position=>"17.", :year=>"2015", :name=>"Hank Conger"},
{:position=>"19.", :year=>"2015", :name=>"Tucker Barnhart"}]
我有一个 HTML table,如下所示:
<table id="TTdata" border="0" cellspacing="0" cellpadding="3" align="center">
<tbody>
<tr class="TTdata_ltblue">
<td class="ctr"><b>#</b></td>
<td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=YEAR">YEAR</a><img src="/images/up.gif"></b></td>
<td class="ctr" title="Player's name."><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=NAME">NAME</a></b></td>
<td class="ctr" title="how many pitches a catcher had a chance/need to frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_CHANCES">FR_CHANCES</a></b></td>
<td class="ctr" title="the number of strikes the catcher is expected to have received according to RPM"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PREDICTED_STRIKES">PREDICTED_STRIKES</a></b></td>
<td class="ctr" title="the number of strikes the catcher actually received"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=ACTUAL_STRIKES">ACTUAL_STRIKES</a></b></td>
<td class="ctr" title="the difference between actual and predicted strikes received by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=EXTRA_STRIKES">EXTRA_STRIKES</a></b></td>
<td class="ctr" title="runs RPM credits to the catcher, using the ball-strike context to calculated run value"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_RUNS_ADDED_BY_COUNT">FR_RUNS_ADDED_BY_COUNT</a><img src="/images/down.gif"></b></td>
<td class="ctr" title="how many runs RPM would assign using a generic .14 runs available per frame"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=FR_RUNS_ADDED_BY_CALL">FR_RUNS_ADDED_BY_CALL</a></b></td>
<td class="ctr" title="pitches the catcher received that could have resulted in a wild pitch or passed ball; this is when runners are on base or a dropped third strike is possible"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=BL_CHANCES">BL_CHANCES</a></b></td>
<td class="ctr"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PREDICTED_PBWP">PREDICTED_PBWP</a></b></td>
<td class="ctr" title="the run value accumulated from preventing wild pitches and passed balls (.28 per PB/WP saved)"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=BL_RUNS_ADDED">BL_RUNS_ADDED</a></b></td>
<td class="ctr" title="the number of passed balls and wild pitches allowed by the catcher"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=ACTUAL_PBWP">ACTUAL_PBWP</a></b></td>
<td class="ctr" title="the difference between actual and predicted passed balls and wild pitches allowed by the catcher
"><b><a href="http://www.baseballprospectus.com/sortable/index.php?cid=1819124&newsort1column=PBWP_SAVED">PBWP_SAVED</a></b></td>
</tr>
<tr class="TTdata">
<td>1.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Yasmani+Grandal" target="_blank">Yasmani Grandal</a></td>
<td class="right">2295</td>
<td class="right">871.5</td>
<td class="right">925</td>
<td class="right">53.5</td>
<td class="right">8.0</td>
<td class="right">8.0</td>
<td class="right">1097</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>2.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Buster+Posey" target="_blank">Buster Posey</a></td>
<td class="right">2601</td>
<td class="right">1,011.4</td>
<td class="right">1,056</td>
<td class="right">44.6</td>
<td class="right">6.6</td>
<td class="right">6.6</td>
<td class="right">1232</td>
<td class="right">10.0</td>
<td class="right">0.0</td>
<td class="right">10</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>3.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Francisco+Cervelli" target="_blank">Francisco Cervelli</a></td>
<td class="right">2629</td>
<td class="right">989.0</td>
<td class="right">1,033</td>
<td class="right">44.0</td>
<td class="right">6.5</td>
<td class="right">6.5</td>
<td class="right">1357</td>
<td class="right">14.0</td>
<td class="right">0.0</td>
<td class="right">14</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>4.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Mike+Zunino" target="_blank">Mike Zunino</a></td>
<td class="right">2828</td>
<td class="right">1,128.8</td>
<td class="right">1,169</td>
<td class="right">40.2</td>
<td class="right">6.0</td>
<td class="right">6.0</td>
<td class="right">1325</td>
<td class="right">19.0</td>
<td class="right">0.0</td>
<td class="right">19</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>5.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Caleb+Joseph" target="_blank">Caleb Joseph</a></td>
<td class="right">2713</td>
<td class="right">993.9</td>
<td class="right">1,031</td>
<td class="right">37.1</td>
<td class="right">5.5</td>
<td class="right">5.5</td>
<td class="right">1315</td>
<td class="right">9.0</td>
<td class="right">0.0</td>
<td class="right">9</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>6.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Chris+Iannetta" target="_blank">Chris Iannetta</a></td>
<td class="right">2158</td>
<td class="right">847.5</td>
<td class="right">884</td>
<td class="right">36.5</td>
<td class="right">5.4</td>
<td class="right">5.4</td>
<td class="right">1078</td>
<td class="right">15.0</td>
<td class="right">0.0</td>
<td class="right">15</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>7.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Jason+Castro" target="_blank">Jason Castro</a></td>
<td class="right">2679</td>
<td class="right">1,068.9</td>
<td class="right">1,105</td>
<td class="right">36.1</td>
<td class="right">5.4</td>
<td class="right">5.4</td>
<td class="right">1378</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>8.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Miguel+Montero" target="_blank">Miguel Montero</a></td>
<td class="right">1977</td>
<td class="right">785.8</td>
<td class="right">820</td>
<td class="right">34.2</td>
<td class="right">5.1</td>
<td class="right">5.1</td>
<td class="right">972</td>
<td class="right">11.0</td>
<td class="right">0.0</td>
<td class="right">11</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>9.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Martin+Maldonado" target="_blank">Martin Maldonado</a></td>
<td class="right">2343</td>
<td class="right">906.0</td>
<td class="right">940</td>
<td class="right">34.0</td>
<td class="right">5.1</td>
<td class="right">5.1</td>
<td class="right">1193</td>
<td class="right">17.0</td>
<td class="right">0.0</td>
<td class="right">17</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>10.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Tyler+Flowers" target="_blank">Tyler Flowers</a></td>
<td class="right">2191</td>
<td class="right">833.4</td>
<td class="right">865</td>
<td class="right">31.6</td>
<td class="right">4.7</td>
<td class="right">4.7</td>
<td class="right">1305</td>
<td class="right">13.0</td>
<td class="right">0.0</td>
<td class="right">13</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>11.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Rene+Rivera" target="_blank">Rene Rivera</a></td>
<td class="right">2632</td>
<td class="right">1,043.1</td>
<td class="right">1,070</td>
<td class="right">26.9</td>
<td class="right">4.0</td>
<td class="right">4.0</td>
<td class="right">1331</td>
<td class="right">18.0</td>
<td class="right">0.0</td>
<td class="right">18</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>12.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Russell+Martin" target="_blank">Russell Martin</a></td>
<td class="right">2919</td>
<td class="right">1,121.3</td>
<td class="right">1,148</td>
<td class="right">26.7</td>
<td class="right">4.0</td>
<td class="right">4.0</td>
<td class="right">1470</td>
<td class="right">27.0</td>
<td class="right">0.0</td>
<td class="right">27</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>13.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Kevin+Plawecki" target="_blank">Kevin Plawecki</a></td>
<td class="right">1826</td>
<td class="right">744.0</td>
<td class="right">770</td>
<td class="right">26.0</td>
<td class="right">3.9</td>
<td class="right">3.9</td>
<td class="right">886</td>
<td class="right">9.0</td>
<td class="right">0.0</td>
<td class="right">9</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>14.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=David+Ross" target="_blank">David Ross</a></td>
<td class="right">941</td>
<td class="right">339.6</td>
<td class="right">361</td>
<td class="right">21.4</td>
<td class="right">3.2</td>
<td class="right">3.2</td>
<td class="right">519</td>
<td class="right">5.0</td>
<td class="right">0.0</td>
<td class="right">5</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>15.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Roberto+Perez" target="_blank">Roberto Perez</a></td>
<td class="right">1969</td>
<td class="right">776.5</td>
<td class="right">789</td>
<td class="right">12.5</td>
<td class="right">1.9</td>
<td class="right">1.9</td>
<td class="right">1090</td>
<td class="right">12.0</td>
<td class="right">0.0</td>
<td class="right">12</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>16.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Welington+Castillo" target="_blank">Welington Castillo</a></td>
<td class="right">1047</td>
<td class="right">410.6</td>
<td class="right">420</td>
<td class="right">9.4</td>
<td class="right">1.4</td>
<td class="right">1.4</td>
<td class="right">499</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>17.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Hank+Conger" target="_blank">Hank Conger</a></td>
<td class="right">1000</td>
<td class="right">405.2</td>
<td class="right">414</td>
<td class="right">8.8</td>
<td class="right">1.3</td>
<td class="right">1.3</td>
<td class="right">511</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata_ltgrey">
<td>18.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Josh+Thole" target="_blank">Josh Thole</a></td>
<td class="right">476</td>
<td class="right">168.8</td>
<td class="right">177</td>
<td class="right">8.2</td>
<td class="right">1.2</td>
<td class="right">1.2</td>
<td class="right">275</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
<tr class="TTdata">
<td>19.</td>
<td class="right">2015</td>
<td><a href="/player_search.php?search_name=Tucker+Barnhart" target="_blank">Tucker Barnhart</a></td>
<td class="right">934</td>
<td class="right">351.4</td>
<td class="right">357</td>
<td class="right">5.6</td>
<td class="right">0.8</td>
<td class="right">0.8</td>
<td class="right">410</td>
<td class="right">4.0</td>
<td class="right">0.0</td>
<td class="right">4</td>
<td class="right">0.0</td>
</tr>
</tbody>
</table>
在这种情况下,我有兴趣检索 table 行中的每个 "player",其中 class 或 TTdata
或 TTdata_ltgrey
.这可以使用以下方法实现:
html = open(url)
doc = Nokogiri::HTML(html)
doc.css('.TTdata, .TTdata_lgrey').each do |catcher|
# parse here
end
我的问题是,td
个条目中的 none 个条目有 class 个与之关联。我只知道TD 1是位置,TD 2是年份,TD 3是名字。
使用上面的迭代访问每个 td
以便我可以为每一行创建 model/hash 对 name/val 对的正确方法是什么?
这是我尝试过的一种方法。但是,是的,您可以从这里更进一步以满足您的需求:
require 'nokogiri'
require 'pp'
doc = Nokogiri::HTML.parse(File.read("#{__dir__}/out1.html"))
data = doc.css('.TTdata, .TTdata_lgrey').map do |tr|
%i(position year name).zip(tr.css("td:nth-child(-n+3)").map(&:text)).to_h
end
pp data
输出
[{:position=>"1.", :year=>"2015", :name=>"Yasmani Grandal"},
{:position=>"3.", :year=>"2015", :name=>"Francisco Cervelli"},
{:position=>"5.", :year=>"2015", :name=>"Caleb Joseph"},
{:position=>"7.", :year=>"2015", :name=>"Jason Castro"},
{:position=>"9.", :year=>"2015", :name=>"Martin Maldonado"},
{:position=>"11.", :year=>"2015", :name=>"Rene Rivera"},
{:position=>"13.", :year=>"2015", :name=>"Kevin Plawecki"},
{:position=>"15.", :year=>"2015", :name=>"Roberto Perez"},
{:position=>"17.", :year=>"2015", :name=>"Hank Conger"},
{:position=>"19.", :year=>"2015", :name=>"Tucker Barnhart"}]