如何使用Nokogiri在TMX中搜索道具元素

Question

我有一个 TMX 翻译记忆库文件，我需要对其进行解析才能将其导入到新数据库中。我正在使用 Ruby + Nokogiri。这是TMX（xml）结构：

<body>
<tu creationdate="20181001T113609Z" creationid="some_user">
<prop type="Att::Attribute1">Value1</prop>
<prop type="Txt::Attribute2">Value2</prop>
<prop type="Txt::Attribute3">Value3</prop>
<prop type="Txt::Attribute4">Value4</prop>
<tuv xml:lang="EN-US">
<seg>Testing</seg>
</tuv>
<tuv xml:lang="SL">
<seg>Testiranje</seg>
</tuv>
</tu>
</body>

为了简单起见，我在这里只包括了 1 个 TU 节点。

这是我当前的脚本：

require 'nokogiri'

doc = File.open("test_for_import.xml") { |f| Nokogiri::XML(f) }

doc.xpath('//tu').each do |x|
  puts "Creation date: " + x.attributes["creationdate"]
  puts "User: " + x.attributes["creationid"]

  x.children.each do |y|
    puts y.children
  end

end

这会产生以下结果：

Creation date: 20181001T113609Z
User: some_user
Value1
Value2
Value3
Value4

<seg>Testing</seg>


<seg>Testiranje</seg>

我需要做的就是搜索Attribute1及其对应的值并赋值给一个变量。在新数据库中创建翻译记录时，这些将用作属性。 seg 我需要同样的东西来获取源代码和翻译。我不想依赖序列，即使它 should/is 总是一样。

继续的最佳方式是什么？所有元素都是classNokogiri::XML::NodeSet 。即使在查看了文档之后，我仍然卡住了。

有人可以帮忙吗？

最好的，塞巴斯蒂安

Answer 1

像这样遍历节点树的最简单方法是使用 XPath。您已经使用 XPath 获取顶级 tu 元素，但您可以进一步扩展 XPath 查询以获取所需的特定元素。

Here on DevHints 是一个方便的作弊工具-sheet 您可以用 XPath 做什么。

相对于指向 tu 元素的 x 变量，以下是您要使用的 XPath：

prop[@type="Att::Attribute1"] 找到你的 prop 属性 1
//seg 或 tuv/seg 用于查找 seg 个元素

这是使用这些 XPath 的完整代码示例。 at_xpath 方法 returns one 结果，而 xpath 方法 returns all 结果.

require 'nokogiri'

doc = File.open("test_for_import.xml") { |f| Nokogiri::XML(f) }

doc.xpath('//tu').each do |x|
  puts "Creation date: " + x.attributes["creationdate"]
  puts "User: " + x.attributes["creationid"]

  # Get Attribute 1
  # There should only be one result for this, so using `at_xpath`
  attr1 = x.at_xpath('prop[@type="Att::Attribute1"]')
  puts "Attribute 1: " + attr1.text

  # Get each seg
  # There will be many results, so using `xpath`
  segs = x.xpath('//seg')
  segs.each do |seg|
    puts "Seg: " + seg.text
  end
end

这输出：

Creation date: 20181001T113609Z
User: some_user
Attribute 1: Value1
Seg: Testing
Seg: Testiranje

如何使用Nokogiri在TMX中搜索道具元素

How to search for prop elements in TMX with Nokogiri

ruby

xml

nokogiri

tmx

xml-parsing