如何搜索一些 XML 数据并使用 Nokogiri Ruby gem 将其替换为新值

How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem

基于下面的 XML 示例文件 employees.xml 并使用 Ruby Nokogiri gem 我想打开这个文件,将 Sandra Defoe 的建筑物编号更改为 320,将房间编号更改为 99,然后保存更改。推荐的方法是什么。

<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>

假设您的内容是一个字符串:

xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>)

doc = Nokogiri.parse(xml)

这可行,但假设名字和姓氏是唯一的,否则它将修改名字和姓氏的第一个匹配项。

target = doc.css('employee').find do |node|
  node.search('firstname').text == 'Sandra' &&
  node.search('lastname').text == 'Defoe'
end

target.at_css('building').content = '320'
target.at_css('room').content = '99'

doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be129">
        <firstname>Jane</firstname>
        <lastname>Doe</lastname>
        <building>327</building>
        <room>19</room>
    </employee>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>320</building>
        <room>99</room>
    </employee>
    <employee id="be133">
        <firstname>Steve</firstname>
        <lastname>Casey</lastname>
        <building>327</building>
        <room>24</room>
    </employee>
</employees>

我会用这个:

require 'nokogiri'

doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
    <employee id="be130">
        <firstname>William</firstname>
        <lastname>Defoe</lastname>
        <building>326</building>
        <room>14a</room>
    </employee>
    <employee id="be132">
        <firstname>Sandra</firstname>
        <lastname>Defoe</lastname>
        <building>327</building>
        <room>22</room>
    </employee>
</employees>
EOT

first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'

这导致:

doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
#    "<employees>\n" +
#    "    <employee id=\"be130\">\n" +
#    "        <firstname>William</firstname>\n" +
#    "        <lastname>Defoe</lastname>\n" +
#    "        <building>326</building>\n" +
#    "        <room>14a</room>\n" +
#    "    </employee>\n" +
#    "    <employee id=\"be132\">\n" +
#    "        <firstname>Sandra</firstname>\n" +
#    "        <lastname>Defoe</lastname>\n" +
#    "        <building>320</building>\n" +
#    "        <room>99</room>\n" +
#    "    </employee>\n" +
#    "</employees>\n"

通常我建议使用 CSS 选择器,因为它们往往会产生较少的视觉噪音,但是 CSS 不会让我们窥视节点的文本,并在可能的情况下解决这个问题, 导致更多的噪音。另一方面,XPath 可能非常嘈杂,但对于这类任务,它更有用。

XPath 有很好的文档记录,弄清楚它在做什么应该很容易。

它的 Ruby 端使用了 "format string":

"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])

类似于

"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name] 
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"

为了完整起见,如果我想专门使用 CSS,我会这样做:

node = doc.search('employee').find { |node| 
  node.at('firstname').text == first_name && node.at('lastname').text == last_name
}

虽然这变得很难看,因为 search 告诉 Nokogiri 从 libXML 中检索所有 employee 节点,然后 Ruby 必须遍历所有节点告诉 Nokogiri 告诉 libXML 查找子 firstnamelastname 节点以及 return 它们的文本。这很慢,特别是如果有很多 employee 个节点,而您想要的节点位于文件底部。

XPath 选择器告诉 Nokogiri 将搜索传递给解析它的 libXML,找到 employee 节点,其子节点包含名字和姓氏,并且 return 仅包含该节点。快多了。

请注意 at('employee') 等同于 search('employee').first

   # File 'lib/nokogiri/xml/searchable.rb', line 70

   def at(*args)
     search(*args).first
   end

最后,调解一个NodeSet#text and Node#text之间的区别,因为第一个会导致精神错乱。