h3内容nokogiri之间按<p>提取

Extract by <p> between h3 content nokogiri

我试图只提取存在于 Vigentes 和 Finalizados 之间的 <p> 而没有实现它。

require 'nokogiri'
require 'open-uri'
require 'time'

@url = "http://www.caru.org.uy/web/servicios/llamados-a-concurso-publico-para-contratar-personal/"

page = Nokogiri::HTML(open(@url))
div_content = page.css('.contenido')

div_content.each do |item|

   puts item.text

   break if  item.css('h3').text == "Finalizados"

end

你应该可以做到:

css = 'h3:contains(Vigentes) ~ p:has(~ h3:contains(Finalizados))'

但不幸的是,nokogiri 在此方面表现不佳,因此我们将使用 xpath:

xpath = "//h3[contains(text(), 'Vigentes')]/following-sibling::p[./following-sibling::h3[contains(text(), 'Finalizados')]]"
page.search(xpath).each do |p|
  # do something
end