使用 Rails Nokogiri XPath 解析属性和元素的 XML sheet
Parsing XML sheet with Rails Nokogiri XPath for attributes and elements
我正在尝试在 rails 4.2.0 环境中使用 Nokogiri 来解析 类 的数据 sheet。我打算对每门课程进行解析,存储@catalog_nbr、@subject 属性以及列出的第一位讲师。我下面的代码只是生成空数组。我相信问题与使用 .each 方法有关,但我无法弄清楚!
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("//course").each do
num = doc.xpath("./@catalog_nbr").text
subject = doc.xpath("./@subject").text
instructor = doc.xpath("./sections/section/meeting/instructors/instructor")[1].text
Course.create(:subject => subject, :number => num, :instructor => instructor)
end
试试这个。
选择文档后,我们需要遍历文档中的每一行。让我们将每一行称为 row
下一个。如果它们为空,则分配默认值。阅读此 article 以获取更多信息。
doc.xpath("//course").each do |row|
num = row.xpath("./@catalog_nbr").text || "N/A"
subject = row.xpath("./@subject").text || "N/A"
instructor = row.xpath("./sections/section/meeting/instructors/instructor")[1].text || "N/A"
Course.create(:subject => subject, :number => num, :instructor => instructor)
end
这是一个可行的解决方案。请注意,您链接到的 XML 文件始终包含每门课程的目录号和主题,因此不需要任何 || "N/A"
(但为了安全起见可能很好):
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.XML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("/courses/course").each do |course|
num = course["catalog_nbr"] || "N/A" # in case it doesn't exist
subj = course["subject"] || "N/A" # in case it doesn't exist
inst = (course.at("sections/section/meeting/instructors/instructor/text()") || "N/A").to_s
data = { subject:subj, number:num, instructor:inst }
p data
end
#=> {:subject=>"CS", :number=>"1110", :instructor=>"Van Loan,C (cfv3)"}
#=> {:subject=>"CS", :number=>"1112", :instructor=>"Fan,K (kdf4)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1132", :instructor=>"Fan,K (kdf4)"}
#=> etc.
我正在尝试在 rails 4.2.0 环境中使用 Nokogiri 来解析 类 的数据 sheet。我打算对每门课程进行解析,存储@catalog_nbr、@subject 属性以及列出的第一位讲师。我下面的代码只是生成空数组。我相信问题与使用 .each 方法有关,但我无法弄清楚!
require 'nokogiri'
require 'open-uri'
doc = Nokogiri::HTML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("//course").each do
num = doc.xpath("./@catalog_nbr").text
subject = doc.xpath("./@subject").text
instructor = doc.xpath("./sections/section/meeting/instructors/instructor")[1].text
Course.create(:subject => subject, :number => num, :instructor => instructor)
end
试试这个。
选择文档后,我们需要遍历文档中的每一行。让我们将每一行称为 row
下一个。如果它们为空,则分配默认值。阅读此 article 以获取更多信息。
doc.xpath("//course").each do |row|
num = row.xpath("./@catalog_nbr").text || "N/A"
subject = row.xpath("./@subject").text || "N/A"
instructor = row.xpath("./sections/section/meeting/instructors/instructor")[1].text || "N/A"
Course.create(:subject => subject, :number => num, :instructor => instructor)
end
这是一个可行的解决方案。请注意,您链接到的 XML 文件始终包含每门课程的目录号和主题,因此不需要任何 || "N/A"
(但为了安全起见可能很好):
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.XML( open("https://courseroster.reg.cornell.edu/courses/roster/SP15/CS/xml/") )
doc.xpath("/courses/course").each do |course|
num = course["catalog_nbr"] || "N/A" # in case it doesn't exist
subj = course["subject"] || "N/A" # in case it doesn't exist
inst = (course.at("sections/section/meeting/instructors/instructor/text()") || "N/A").to_s
data = { subject:subj, number:num, instructor:inst }
p data
end
#=> {:subject=>"CS", :number=>"1110", :instructor=>"Van Loan,C (cfv3)"}
#=> {:subject=>"CS", :number=>"1112", :instructor=>"Fan,K (kdf4)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1130", :instructor=>"Frey,C (ccf27)"}
#=> {:subject=>"CS", :number=>"1132", :instructor=>"Fan,K (kdf4)"}
#=> etc.