如何搜索一些 XML 数据并使用 Nokogiri Ruby gem 将其替换为新值
How to search for some XML data and repleace it with a new value using Nokogiri Ruby gem
基于下面的 XML 示例文件 employees.xml 并使用 Ruby Nokogiri gem 我想打开这个文件,将 Sandra Defoe 的建筑物编号更改为 320,将房间编号更改为 99,然后保存更改。推荐的方法是什么。
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>
假设您的内容是一个字符串:
xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>)
doc = Nokogiri.parse(xml)
这可行,但假设名字和姓氏是唯一的,否则它将修改名字和姓氏的第一个匹配项。
target = doc.css('employee').find do |node|
node.search('firstname').text == 'Sandra' &&
node.search('lastname').text == 'Defoe'
end
target.at_css('building').content = '320'
target.at_css('room').content = '99'
doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>320</building>
<room>99</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>
我会用这个:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
</employees>
EOT
first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'
这导致:
doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
# "<employees>\n" +
# " <employee id=\"be130\">\n" +
# " <firstname>William</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>326</building>\n" +
# " <room>14a</room>\n" +
# " </employee>\n" +
# " <employee id=\"be132\">\n" +
# " <firstname>Sandra</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>320</building>\n" +
# " <room>99</room>\n" +
# " </employee>\n" +
# "</employees>\n"
通常我建议使用 CSS 选择器,因为它们往往会产生较少的视觉噪音,但是 CSS 不会让我们窥视节点的文本,并在可能的情况下解决这个问题, 导致更多的噪音。另一方面,XPath 可能非常嘈杂,但对于这类任务,它更有用。
XPath 有很好的文档记录,弄清楚它在做什么应该很容易。
它的 Ruby 端使用了 "format string":
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
类似于
"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name]
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"
为了完整起见,如果我想专门使用 CSS,我会这样做:
node = doc.search('employee').find { |node|
node.at('firstname').text == first_name && node.at('lastname').text == last_name
}
虽然这变得很难看,因为 search
告诉 Nokogiri 从 libXML 中检索所有 employee
节点,然后 Ruby 必须遍历所有节点告诉 Nokogiri 告诉 libXML 查找子 firstname
和 lastname
节点以及 return 它们的文本。这很慢,特别是如果有很多 employee
个节点,而您想要的节点位于文件底部。
XPath 选择器告诉 Nokogiri 将搜索传递给解析它的 libXML,找到 employee
节点,其子节点包含名字和姓氏,并且 return 仅包含该节点。快多了。
请注意 at('employee')
等同于 search('employee').first
。
# File 'lib/nokogiri/xml/searchable.rb', line 70
def at(*args)
search(*args).first
end
最后,调解一个NodeSet#text and Node#text之间的区别,因为第一个会导致精神错乱。
基于下面的 XML 示例文件 employees.xml 并使用 Ruby Nokogiri gem 我想打开这个文件,将 Sandra Defoe 的建筑物编号更改为 320,将房间编号更改为 99,然后保存更改。推荐的方法是什么。
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>
假设您的内容是一个字符串:
xml=%q(
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>)
doc = Nokogiri.parse(xml)
这可行,但假设名字和姓氏是唯一的,否则它将修改名字和姓氏的第一个匹配项。
target = doc.css('employee').find do |node|
node.search('firstname').text == 'Sandra' &&
node.search('lastname').text == 'Defoe'
end
target.at_css('building').content = '320'
target.at_css('room').content = '99'
doc # outputs the updated xml
=> <?xml version="1.0"?>
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be129">
<firstname>Jane</firstname>
<lastname>Doe</lastname>
<building>327</building>
<room>19</room>
</employee>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>320</building>
<room>99</room>
</employee>
<employee id="be133">
<firstname>Steve</firstname>
<lastname>Casey</lastname>
<building>327</building>
<room>24</room>
</employee>
</employees>
我会用这个:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<?xml version="1.0" encoding="utf-16"?>
<employees>
<employee id="be130">
<firstname>William</firstname>
<lastname>Defoe</lastname>
<building>326</building>
<room>14a</room>
</employee>
<employee id="be132">
<firstname>Sandra</firstname>
<lastname>Defoe</lastname>
<building>327</building>
<room>22</room>
</employee>
</employees>
EOT
first_name = 'Sandra'
last_name = 'Defoe'
node = doc.at("//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
node.at('building').content = '320'
node.at('room').content = '99'
这导致:
doc.to_xml
# => "\uFEFF<?xml version=\"1.0\" encoding=\"utf-16\"?>\n" +
# "<employees>\n" +
# " <employee id=\"be130\">\n" +
# " <firstname>William</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>326</building>\n" +
# " <room>14a</room>\n" +
# " </employee>\n" +
# " <employee id=\"be132\">\n" +
# " <firstname>Sandra</firstname>\n" +
# " <lastname>Defoe</lastname>\n" +
# " <building>320</building>\n" +
# " <room>99</room>\n" +
# " </employee>\n" +
# "</employees>\n"
通常我建议使用 CSS 选择器,因为它们往往会产生较少的视觉噪音,但是 CSS 不会让我们窥视节点的文本,并在可能的情况下解决这个问题, 导致更多的噪音。另一方面,XPath 可能非常嘈杂,但对于这类任务,它更有用。
XPath 有很好的文档记录,弄清楚它在做什么应该很容易。
它的 Ruby 端使用了 "format string":
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name])
类似于
"%s %s" % [first_name, last_name] # => "Sandra Defoe"
"//employee[firstname/text()='%s' and lastname/text()='%s']" % [first_name, last_name]
# => "//employee[firstname/text()='Sandra' and lastname/text()='Defoe']"
为了完整起见,如果我想专门使用 CSS,我会这样做:
node = doc.search('employee').find { |node|
node.at('firstname').text == first_name && node.at('lastname').text == last_name
}
虽然这变得很难看,因为 search
告诉 Nokogiri 从 libXML 中检索所有 employee
节点,然后 Ruby 必须遍历所有节点告诉 Nokogiri 告诉 libXML 查找子 firstname
和 lastname
节点以及 return 它们的文本。这很慢,特别是如果有很多 employee
个节点,而您想要的节点位于文件底部。
XPath 选择器告诉 Nokogiri 将搜索传递给解析它的 libXML,找到 employee
节点,其子节点包含名字和姓氏,并且 return 仅包含该节点。快多了。
请注意 at('employee')
等同于 search('employee').first
。
# File 'lib/nokogiri/xml/searchable.rb', line 70 def at(*args) search(*args).first end
最后,调解一个NodeSet#text and Node#text之间的区别,因为第一个会导致精神错乱。