如何使用 Nokogiri 从网站上抓取数据

How to scrape data from a website using Nokogiri

When I try to scrape the table data from the following link it displays nothing.. `

我写了下面的代码,但它什么也没给出。我想要 table 数据,即上次更新、天气、温度 link 我给的请帮助我..

url = "http://w1.weather.gov/xml/current_obs/KM89.xml"

docs = Nokogiri::HTML(open(url))

puts docs.css("table")

转到该页面,打开您的开发工具,当您在“网络”选项卡下找到对 KM89.xml 的请求响应时,您会看到它没有返回 HTML,而是 XML 喜欢这个:

<?xml version="1.0" encoding="ISO-8859-1"?> 
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
   xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
   xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd">
  <credit>NOAA's National Weather Service</credit>
  <credit_URL>http://weather.gov/</credit_URL>
  <image>
    <url>http://weather.gov/images/xml_logo.gif</url>
    <title>NOAA's National Weather Service</title>
    <link>http://weather.gov</link>
  </image>
  <suggested_pickup>15 minutes after the hour</suggested_pickup>
  <suggested_pickup_period>60</suggested_pickup_period>
  <location>Dexter B Florence Memorial Field Airport, AR</location>
  <station_id>KM89</station_id>
  <latitude>34.1</latitude>
  <longitude>-93.07</longitude>
  <observation_time>Last Updated on Nov 23 2012, 7:56 am CST</observation_time>
        <observation_time_rfc822>Fri, 23 Nov 2012 07:56:00 -0600</observation_time_rfc822>
  <weather>Light Rain</weather>
  <temperature_string>57.0 F (13.8 C)</temperature_string>
  <temp_f>57.0</temp_f>
  <temp_c>13.8</temp_c>
  <relative_humidity>87</relative_humidity>
  <wind_string>Northeast at 8.1 MPH (7 KT)</wind_string>
  <wind_dir>Northeast</wind_dir>
  <wind_degrees>30</wind_degrees>
  <wind_mph>8.1</wind_mph>
  <wind_kt>7</wind_kt>
  <pressure_string>1027.5 mb</pressure_string>
  <pressure_mb>1027.5</pressure_mb>
  <pressure_in>30.30</pressure_in>
  <dewpoint_string>52.9 F (11.6 C)</dewpoint_string>
  <dewpoint_f>52.9</dewpoint_f>
  <dewpoint_c>11.6</dewpoint_c>
  <windchill_string>55 F (13 C)</windchill_string>
        <windchill_f>55</windchill_f>
        <windchill_c>13</windchill_c>
  <visibility_mi>10.00</visibility_mi>
  <icon_url_base>http://forecast.weather.gov/images/wtf/small/</icon_url_base>
  <two_day_history_url>http://www.weather.gov/data/obhistory/KM89.html</two_day_history_url>
  <icon_url_name>ra1.png</icon_url_name>
  <ob_url>http://www.weather.gov/data/METAR/KM89.1.txt</ob_url>
  <disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url>
  <copyright_url>http://weather.gov/disclaimer.html</copyright_url>
  <privacy_policy_url>http://weather.gov/notice.html</privacy_policy_url>
</current_observation>

所以你可以这样抓取它:

require 'open-uri'
require 'nokogiri'

url = 'http://w1.weather.gov/xml/current_obs/KM89.xml'
doc = Nokogiri::HTML(open(url))

p doc.at_css('station_id').text