如何使用 Nokogiri 从网站上抓取数据
How to scrape data from a website using Nokogiri
When I try to scrape the table data from the following link it displays nothing.. `
我写了下面的代码,但它什么也没给出。我想要 table 数据,即上次更新、天气、温度 link 我给的请帮助我..
url = "http://w1.weather.gov/xml/current_obs/KM89.xml"
docs = Nokogiri::HTML(open(url))
puts docs.css("table")
转到该页面,打开您的开发工具,当您在“网络”选项卡下找到对 KM89.xml 的请求响应时,您会看到它没有返回 HTML,而是 XML 喜欢这个:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd">
<credit>NOAA's National Weather Service</credit>
<credit_URL>http://weather.gov/</credit_URL>
<image>
<url>http://weather.gov/images/xml_logo.gif</url>
<title>NOAA's National Weather Service</title>
<link>http://weather.gov</link>
</image>
<suggested_pickup>15 minutes after the hour</suggested_pickup>
<suggested_pickup_period>60</suggested_pickup_period>
<location>Dexter B Florence Memorial Field Airport, AR</location>
<station_id>KM89</station_id>
<latitude>34.1</latitude>
<longitude>-93.07</longitude>
<observation_time>Last Updated on Nov 23 2012, 7:56 am CST</observation_time>
<observation_time_rfc822>Fri, 23 Nov 2012 07:56:00 -0600</observation_time_rfc822>
<weather>Light Rain</weather>
<temperature_string>57.0 F (13.8 C)</temperature_string>
<temp_f>57.0</temp_f>
<temp_c>13.8</temp_c>
<relative_humidity>87</relative_humidity>
<wind_string>Northeast at 8.1 MPH (7 KT)</wind_string>
<wind_dir>Northeast</wind_dir>
<wind_degrees>30</wind_degrees>
<wind_mph>8.1</wind_mph>
<wind_kt>7</wind_kt>
<pressure_string>1027.5 mb</pressure_string>
<pressure_mb>1027.5</pressure_mb>
<pressure_in>30.30</pressure_in>
<dewpoint_string>52.9 F (11.6 C)</dewpoint_string>
<dewpoint_f>52.9</dewpoint_f>
<dewpoint_c>11.6</dewpoint_c>
<windchill_string>55 F (13 C)</windchill_string>
<windchill_f>55</windchill_f>
<windchill_c>13</windchill_c>
<visibility_mi>10.00</visibility_mi>
<icon_url_base>http://forecast.weather.gov/images/wtf/small/</icon_url_base>
<two_day_history_url>http://www.weather.gov/data/obhistory/KM89.html</two_day_history_url>
<icon_url_name>ra1.png</icon_url_name>
<ob_url>http://www.weather.gov/data/METAR/KM89.1.txt</ob_url>
<disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url>
<copyright_url>http://weather.gov/disclaimer.html</copyright_url>
<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_url>
</current_observation>
所以你可以这样抓取它:
require 'open-uri'
require 'nokogiri'
url = 'http://w1.weather.gov/xml/current_obs/KM89.xml'
doc = Nokogiri::HTML(open(url))
p doc.at_css('station_id').text
When I try to scrape the table data from the following link it displays nothing.. `
我写了下面的代码,但它什么也没给出。我想要 table 数据,即上次更新、天气、温度 link 我给的请帮助我..
url = "http://w1.weather.gov/xml/current_obs/KM89.xml"
docs = Nokogiri::HTML(open(url))
puts docs.css("table")
转到该页面,打开您的开发工具,当您在“网络”选项卡下找到对 KM89.xml 的请求响应时,您会看到它没有返回 HTML,而是 XML 喜欢这个:
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="latest_ob.xsl" type="text/xsl"?>
<current_observation version="1.0"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:noNamespaceSchemaLocation="http://www.weather.gov/view/current_observation.xsd">
<credit>NOAA's National Weather Service</credit>
<credit_URL>http://weather.gov/</credit_URL>
<image>
<url>http://weather.gov/images/xml_logo.gif</url>
<title>NOAA's National Weather Service</title>
<link>http://weather.gov</link>
</image>
<suggested_pickup>15 minutes after the hour</suggested_pickup>
<suggested_pickup_period>60</suggested_pickup_period>
<location>Dexter B Florence Memorial Field Airport, AR</location>
<station_id>KM89</station_id>
<latitude>34.1</latitude>
<longitude>-93.07</longitude>
<observation_time>Last Updated on Nov 23 2012, 7:56 am CST</observation_time>
<observation_time_rfc822>Fri, 23 Nov 2012 07:56:00 -0600</observation_time_rfc822>
<weather>Light Rain</weather>
<temperature_string>57.0 F (13.8 C)</temperature_string>
<temp_f>57.0</temp_f>
<temp_c>13.8</temp_c>
<relative_humidity>87</relative_humidity>
<wind_string>Northeast at 8.1 MPH (7 KT)</wind_string>
<wind_dir>Northeast</wind_dir>
<wind_degrees>30</wind_degrees>
<wind_mph>8.1</wind_mph>
<wind_kt>7</wind_kt>
<pressure_string>1027.5 mb</pressure_string>
<pressure_mb>1027.5</pressure_mb>
<pressure_in>30.30</pressure_in>
<dewpoint_string>52.9 F (11.6 C)</dewpoint_string>
<dewpoint_f>52.9</dewpoint_f>
<dewpoint_c>11.6</dewpoint_c>
<windchill_string>55 F (13 C)</windchill_string>
<windchill_f>55</windchill_f>
<windchill_c>13</windchill_c>
<visibility_mi>10.00</visibility_mi>
<icon_url_base>http://forecast.weather.gov/images/wtf/small/</icon_url_base>
<two_day_history_url>http://www.weather.gov/data/obhistory/KM89.html</two_day_history_url>
<icon_url_name>ra1.png</icon_url_name>
<ob_url>http://www.weather.gov/data/METAR/KM89.1.txt</ob_url>
<disclaimer_url>http://weather.gov/disclaimer.html</disclaimer_url>
<copyright_url>http://weather.gov/disclaimer.html</copyright_url>
<privacy_policy_url>http://weather.gov/notice.html</privacy_policy_url>
</current_observation>
所以你可以这样抓取它:
require 'open-uri'
require 'nokogiri'
url = 'http://w1.weather.gov/xml/current_obs/KM89.xml'
doc = Nokogiri::HTML(open(url))
p doc.at_css('station_id').text