刮擦部门和行业形成晨星
scrape sector and industry form morningstar
我想从 Morningstar 页面上抓取部门和行业。我可以看到数据,Watir 也看到了。但是当我试图抓住 div 它没有 return 任何东西。
irb(main):001:0> require 'watir'
=> true
irb(main):008:0> browser= Watir::Browser.new
DevTools listening on ws://127.0.0.1:49780/devtools/browser/4e473d9e-4818-45ad-8238-587bc931099a
=> #<Watir::Browser:0x..f0e9773de url="data:," title="">
irb(main):006:0> path="http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
=> "http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
irb(main):007:0> goto(path)
irb(main):009:0> browser.goto(path)
[41088:42292:1007/225520.743:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
=> "http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
irb(main):010:0> browser.text.include?"Sector" #### CAN FIND THE word sector.
=> true
irb(main):011:0> browser.div(:class=>"sal-dp-panel") ##### it cannot find the class at all.
=> #<Watir::Div: located: false; {:class=>"sal-dp-panel", :tag_name=>"div"}>
irb(main):015:0> divs=browser.divs(:class=>"sal-dp-panel")
=> #<Watir::DivCollection:0x000000079722d0 @query_scope=#<Watir::Browser:0xdbd2266a url="http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA" title="GOOG 1157.35 -0.93 (Alphabet Inc Class C)">, @selector={:class=>"sal-dp-panel", :tag_name=>"div"}>
irb(main):018:0> divs.count
=> 0
irb(main):019:0> divs.each{|div| puts div.text}
=> []
irb(main):020:0> divs.each{|div| puts "got one"}
=> []
我认为您使用了错误的定位器
试试下面
b = Watir::Browser.new
b.goto 'http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA'
p b.divs(class: 'sal-dp-name')[7].text
p b.div(text: 'Technology').preceding_sibling.text
输出
"Sector"
"Sector"
我用两种不同的方式定位了 Sector
,第二种比第一种更可靠,因为我使用 Technology
定位了 Sector
字符串。
问题是页面上没有带有 class "sal-dp-panel" 的元素。也许你想得到 "sal-dp-pair",也就是包含 name/value 对的 div?
<div class="sal-dp-pair">
<div class="sal-dp-name ng-binding">Sector</div>
<div class="sal-dp-value ng-binding">Technology</div>
</div>
要抓取部门和行业,你可以找到相关的"sal-dp-name",然后找到它对应的值(即以下兄弟):
browser.div(class: 'sal-dp-name', text: 'Sector').following_sibling.text
#=> "Technology"
browser.div(class: 'sal-dp-name', text: 'Industry').following_sibling.text
#=> Internet Content & Information"
我想从 Morningstar 页面上抓取部门和行业。我可以看到数据,Watir 也看到了。但是当我试图抓住 div 它没有 return 任何东西。
irb(main):001:0> require 'watir'
=> true
irb(main):008:0> browser= Watir::Browser.new
DevTools listening on ws://127.0.0.1:49780/devtools/browser/4e473d9e-4818-45ad-8238-587bc931099a
=> #<Watir::Browser:0x..f0e9773de url="data:," title="">
irb(main):006:0> path="http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
=> "http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
irb(main):007:0> goto(path)
irb(main):009:0> browser.goto(path)
[41088:42292:1007/225520.743:ERROR:platform_sensor_reader_win.cc(242)] NOT IMPLEMENTED
=> "http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA"
irb(main):010:0> browser.text.include?"Sector" #### CAN FIND THE word sector.
=> true
irb(main):011:0> browser.div(:class=>"sal-dp-panel") ##### it cannot find the class at all.
=> #<Watir::Div: located: false; {:class=>"sal-dp-panel", :tag_name=>"div"}>
irb(main):015:0> divs=browser.divs(:class=>"sal-dp-panel")
=> #<Watir::DivCollection:0x000000079722d0 @query_scope=#<Watir::Browser:0xdbd2266a url="http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA" title="GOOG 1157.35 -0.93 (Alphabet Inc Class C)">, @selector={:class=>"sal-dp-panel", :tag_name=>"div"}>
irb(main):018:0> divs.count
=> 0
irb(main):019:0> divs.each{|div| puts div.text}
=> []
irb(main):020:0> divs.each{|div| puts "got one"}
=> []
我认为您使用了错误的定位器
试试下面
b = Watir::Browser.new
b.goto 'http://quote.morningstar.ca/Quicktakes/stock/stock_beta.aspx?t=GOOG®ion=USA&culture=en-CA'
p b.divs(class: 'sal-dp-name')[7].text
p b.div(text: 'Technology').preceding_sibling.text
输出
"Sector"
"Sector"
我用两种不同的方式定位了 Sector
,第二种比第一种更可靠,因为我使用 Technology
定位了 Sector
字符串。
问题是页面上没有带有 class "sal-dp-panel" 的元素。也许你想得到 "sal-dp-pair",也就是包含 name/value 对的 div?
<div class="sal-dp-pair">
<div class="sal-dp-name ng-binding">Sector</div>
<div class="sal-dp-value ng-binding">Technology</div>
</div>
要抓取部门和行业,你可以找到相关的"sal-dp-name",然后找到它对应的值(即以下兄弟):
browser.div(class: 'sal-dp-name', text: 'Sector').following_sibling.text
#=> "Technology"
browser.div(class: 'sal-dp-name', text: 'Industry').following_sibling.text
#=> Internet Content & Information"