如何根据属性从 XML 获取后代节点
How to get descendant nodes from XML based on an attribute
我正在尝试获取节点的后代子节点:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
nom_id = @doc.xpath('//race/nomination/@id')
race_id.each do |x|
puts race_id.traverse {|race_id| puts nom_id }
end
我正在查看两个信息来源:
XML:Node
的文档,其中有
Nokogiri::XML::Node#children
sparklemotion 的 Cheat-sheet:
node.traverse {|node| } # yields all children and self to a block, _recursivel
这是我的测试XML:
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
我可以使用 XPath 轻松获得比赛 id
:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
race_id = @doc.xpath('//race/@id')
nom_id = @doc.xpath('//race/nomination/@id')
...
215411
215412
如何获取 race_id
215411 的节点提名 ID 和编号并将其存储到散列中(如下所示)?
{215411 => [{id:198926, number:8},{id:198965, number:2}]}
require 'nokogiri'
# xml data
str =<<-EOS
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOS
# create doc
doc = Nokogiri::XML(str)
# clean; via
doc.xpath('//text()[not(normalize-space())]').remove
# parse doc
parsed_doc = doc.xpath('//race').inject({}) {|h,x| h[x.get_attribute('id').to_i] = x.children.map {|y| {id: y.get_attribute('id').to_i, number: y.get_attribute('number').to_i}}; h}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}],
# 215412=>
# [{:id=>199634, :number=>1},
# {:id=>208926, :number=>2},
# {:id=>122923, :number=>3}]}
# select via id
parsed_doc.select {|k,v| k == 215411}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}]}
这是单线作为多线:
parsed_doc = doc.xpath('//race').inject({}) do |h,x|
h[x.get_attribute('id').to_i] = x.children.map do |y|
{
id: y.get_attribute('id').to_i,
number: y.get_attribute('number').to_i
}
end
h
end
我会这样做:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOT
race_id = 215411
nominations = doc.at("race[id='#{race_id}']")
.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race_id => nominations}
# => {215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}
race[id='#{race_id}']
正在构建一个 CSS 选择器以找到所需的节点。然后很容易找到所需的 nomination
节点。
请注意,我不使用 children
或 traverse
,因为它们将 return 所有节点,包括文本节点,而不仅仅是元素节点。我必须使用额外的逻辑来忽略文本节点,这会浪费时间和 space.
你的问题不清楚,但如果你想return所有种族的信息,这是一个简单的调整:
doc.search('race').map{ |race|
nominations = race.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race['id'].to_i => nominations}
}
# => [{215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}, {215412=>[{:number=>1, :id=>199634}, {:number=>2, :id=>208926}, {:number=>3, :id=>122923}]}]
我正在尝试获取节点的后代子节点:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
nom_id = @doc.xpath('//race/nomination/@id')
race_id.each do |x|
puts race_id.traverse {|race_id| puts nom_id }
end
我正在查看两个信息来源:
XML:Node
的文档,其中有Nokogiri::XML::Node#children
sparklemotion 的 Cheat-sheet:
node.traverse {|node| } # yields all children and self to a block, _recursivel
这是我的测试XML:
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
我可以使用 XPath 轻松获得比赛 id
:
require 'nokogiri'
@doc = Nokogiri::XML(File.open('data/20160521RHIL0.xml'))
race_id = @doc.xpath('//race/@id')
nom_id = @doc.xpath('//race/nomination/@id')
...
215411
215412
如何获取 race_id
215411 的节点提名 ID 和编号并将其存储到散列中(如下所示)?
{215411 => [{id:198926, number:8},{id:198965, number:2}]}
require 'nokogiri'
# xml data
str =<<-EOS
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOS
# create doc
doc = Nokogiri::XML(str)
# clean; via
doc.xpath('//text()[not(normalize-space())]').remove
# parse doc
parsed_doc = doc.xpath('//race').inject({}) {|h,x| h[x.get_attribute('id').to_i] = x.children.map {|y| {id: y.get_attribute('id').to_i, number: y.get_attribute('number').to_i}}; h}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}],
# 215412=>
# [{:id=>199634, :number=>1},
# {:id=>208926, :number=>2},
# {:id=>122923, :number=>3}]}
# select via id
parsed_doc.select {|k,v| k == 215411}
# {215411=>
# [{:id=>198926, :number=>8},
# {:id=>198965, :number=>2},
# {:id=>199260, :number=>1}]}
这是单线作为多线:
parsed_doc = doc.xpath('//race').inject({}) do |h,x|
h[x.get_attribute('id').to_i] = x.children.map do |y|
{
id: y.get_attribute('id').to_i,
number: y.get_attribute('number').to_i
}
end
h
end
我会这样做:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<meeting id="42977">
<race id="215411">
<nomination number="8" saddlecloth="8" horse="Chipanda" id="198926" />
<nomination number="2" saddlecloth="2" horse="Chifries" id="198965" />
<nomination number="1" saddlecloth="1" horse="Itpanda" id="199260" />
</race>
<race id="215412">
<nomination number="1" saddlecloth="1" horse="Ruby" id="199634" />
<nomination number="2" saddlecloth="2" horse="Gems" id="208926" />
<nomination number="3" saddlecloth="3" horse="Rock" id="122923" />
</race>
</meeting>
EOT
race_id = 215411
nominations = doc.at("race[id='#{race_id}']")
.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race_id => nominations}
# => {215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}
race[id='#{race_id}']
正在构建一个 CSS 选择器以找到所需的节点。然后很容易找到所需的 nomination
节点。
请注意,我不使用 children
或 traverse
,因为它们将 return 所有节点,包括文本节点,而不仅仅是元素节点。我必须使用额外的逻辑来忽略文本节点,这会浪费时间和 space.
你的问题不清楚,但如果你想return所有种族的信息,这是一个简单的调整:
doc.search('race').map{ |race|
nominations = race.search('nomination')
.map{ |nomination|
{
number: nomination['number'].to_i,
id: nomination['id'].to_i
}
}
{race['id'].to_i => nominations}
}
# => [{215411=>[{:number=>8, :id=>198926}, {:number=>2, :id=>198965}, {:number=>1, :id=>199260}]}, {215412=>[{:number=>1, :id=>199634}, {:number=>2, :id=>208926}, {:number=>3, :id=>122923}]}]