使用 Nokogiri 解析具有多个值的节点的 XML
Parse XML that has a node with multiple values with Nokogiri
我不太清楚XML的语法是什么,所以我会放两种XML,请指出好的
我 XML 有一个具有多个值的节点:
案例 1:
<items>
<item>
<image_urls>http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
案例 2:
<items>
<item>
<image_urls>
http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg
</image_urls>
<image_urls>http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
我面临的难题是使用 Nokogiri 获取多值节点。我试过了:
item.at("image_urls").to_s.split(" ").inject([]) { |result, element|
result << element
}
但这只适用于 XML 的第一个变体。如果正确的语法是第二种形式,我相信它是,我怎么能同时取两个值,因为我的以下实现只取第一个?
xml = Nokogiri::XML(File.open(self.file.current_path))
xml.xpath("//item").each do |item|
attachments_array = item.at("image_urls").inject([]) { |result, element|
result << element
}
您需要使用 css
方法,该方法 returns 所有匹配,而不是 at
方法 returns 仅第一个匹配:
text = <<EOD
<items>
<item>
<image_urls>http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
<item>
<image_urls>
http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg</image_urls>
<image_urls>http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
EOD
xml = Nokogiri::XML(text)
xml.css('item').each do |item|
attachments = item.css('image_urls').map do |url|
url.text.strip!.split(' ')
end.flatten
p attachments
end
# ["http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg", "http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg"]
# ["http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg", "http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg"]
我不太清楚XML的语法是什么,所以我会放两种XML,请指出好的
我 XML 有一个具有多个值的节点:
案例 1:
<items>
<item>
<image_urls>http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
案例 2:
<items>
<item>
<image_urls>
http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg
</image_urls>
<image_urls>http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
我面临的难题是使用 Nokogiri 获取多值节点。我试过了:
item.at("image_urls").to_s.split(" ").inject([]) { |result, element|
result << element
}
但这只适用于 XML 的第一个变体。如果正确的语法是第二种形式,我相信它是,我怎么能同时取两个值,因为我的以下实现只取第一个?
xml = Nokogiri::XML(File.open(self.file.current_path))
xml.xpath("//item").each do |item|
attachments_array = item.at("image_urls").inject([]) { |result, element|
result << element
}
您需要使用 css
方法,该方法 returns 所有匹配,而不是 at
方法 returns 仅第一个匹配:
text = <<EOD
<items>
<item>
<image_urls>http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
<item>
<image_urls>
http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg</image_urls>
<image_urls>http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg
</image_urls>
</item>
</items>
EOD
xml = Nokogiri::XML(text)
xml.css('item').each do |item|
attachments = item.css('image_urls').map do |url|
url.text.strip!.split(' ')
end.flatten
p attachments
end
# ["http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg", "http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg"]
# ["http://static.elefant.ro/images/26/95226/husa-belkin-grip-pentru-kindle-3-ebook-reader-albastru_1_categorie.jpg", "http://www.keenthemes.com/preview/metronic/theme/assets/global/plugins/jcrop/demos/demo_files/image1.jpg"]