使用 Rails Nokogiri XPath 解析属性和元素的 XML sheet

Parsing XML sheet with Rails Nokogiri XPath for attributes and elements

我正在尝试在 rails 4.2.0 环境中使用 Nokogiri 来解析 类 的数据 sheet。我打算对每门课程进行解析,存储@catalog_nbr、@subject 属性以及列出的第一位讲师。我下面的代码只是生成空数组。我相信问题与使用 .each 方法有关,但我无法弄清楚!

require 'nokogiri'
require 'open-uri'

doc = Nokogiri::HTML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("//course").each do
  num = doc.xpath("./@catalog_nbr").text
  subject = doc.xpath("./@subject").text
  instructor = doc.xpath("./sections/section/meeting/instructors/instructor")[1].text
  Course.create(:subject => subject, :number => num, :instructor => instructor)
end

试试这个。 选择文档后,我们需要遍历文档中的每一行。让我们将每一行称为 row 下一个。如果它们为空,则分配默认值。阅读此 article 以获取更多信息。

doc.xpath("//course").each do |row|
  num = row.xpath("./@catalog_nbr").text  || "N/A"
  subject = row.xpath("./@subject").text || "N/A"
  instructor = row.xpath("./sections/section/meeting/instructors/instructor")[1].text  || "N/A"
  Course.create(:subject => subject, :number => num, :instructor => instructor)
end

这是一个可行的解决方案。请注意,您链接到的 XML 文件始终包含每门课程的目录号和主题,因此不需要任何 || "N/A" (但为了安全起见可能很好):

require 'nokogiri'
require 'open-uri'

doc = Nokogiri.XML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("/courses/course").each do |course|
  num  = course["catalog_nbr"] || "N/A"  # in case it doesn't exist
  subj = course["subject"]     || "N/A"  # in case it doesn't exist
  inst = (course.at("sections/section/meeting/instructors/instructor/text()") || "N/A").to_s
  data = { subject:subj, number:num, instructor:inst }
  p data
end

#=> {:subject=>"CS", :number=>"1110", :instructor=>"Van Loan,C (cfv3)"}
#=> {:subject=>"CS", :number=>"1112", :instructor=>"Fan,K (kdf4)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1132", :instructor=>"Fan,K (kdf4)"}
#=> etc.