Watir 抓取顺序元素:如此简单,但没有
Watir scraping sequential elements : so simple, but no
这很简单...
我想用 watir (gem of ruby:)
删除一些类似的网页
<div class="Time">time1</div>
<div class="Locus">locus1</div>
<div class="Locus">locus2</div>
<div class="Time">time2</div>
<div class="Locus">locus3</div>
<div class="Time">time3</div>
<div class="Locus">locus4</div>
<div class="Locus">locus5</div>
<div class="Locus">locus6</div>
<div class="Time">time4</div>
etc..
结果应该是这样的数组:
time1 locus1
time1 locus2
time2 locus3
time3 locus4
time3 locus5
time3 locus6
time4 xxx
所有的 div 都在同一层级(不是重叠的)。
无法使用 watir 方法找到解决方案...
谢谢你的帮助
对于每个 Locus 元素,您可以通过 #preceding_sibling
方法检索前面的 Time 元素:
result = browser.divs(class: 'Locus').map do |div|
time = div.preceding_sibling(class: 'Time').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]
请注意,如果列表很长,您可能希望通过 Watir 检索 HTML,然后在 Nokogiri 中进行解析。这将节省大量的执行时间,但以可读性为代价。
doc = Nokogiri::HTML.parse(browser.html) # where `browser` is the usual Watir::Browser
result = doc.css('.Locus').map do |div|
time = div.at('./preceding-sibling::div[@class="Time"]').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]
这很简单... 我想用 watir (gem of ruby:)
删除一些类似的网页<div class="Time">time1</div>
<div class="Locus">locus1</div>
<div class="Locus">locus2</div>
<div class="Time">time2</div>
<div class="Locus">locus3</div>
<div class="Time">time3</div>
<div class="Locus">locus4</div>
<div class="Locus">locus5</div>
<div class="Locus">locus6</div>
<div class="Time">time4</div>
etc..
结果应该是这样的数组:
time1 locus1
time1 locus2
time2 locus3
time3 locus4
time3 locus5
time3 locus6
time4 xxx
所有的 div 都在同一层级(不是重叠的)。 无法使用 watir 方法找到解决方案... 谢谢你的帮助
对于每个 Locus 元素,您可以通过 #preceding_sibling
方法检索前面的 Time 元素:
result = browser.divs(class: 'Locus').map do |div|
time = div.preceding_sibling(class: 'Time').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time2 locus3", "time3 locus4", "time3 locus5", "time3 locus6"]
请注意,如果列表很长,您可能希望通过 Watir 检索 HTML,然后在 Nokogiri 中进行解析。这将节省大量的执行时间,但以可读性为代价。
doc = Nokogiri::HTML.parse(browser.html) # where `browser` is the usual Watir::Browser
result = doc.css('.Locus').map do |div|
time = div.at('./preceding-sibling::div[@class="Time"]').text
locus = div.text
"#{time} #{locus}"
end
p result
#=> ["time1 locus1", "time1 locus2", "time1 locus3", "time1 locus4", "time1 locus5", "time1 locus6"]