如何处理“<?” xml 文件中的注释 Python

Question

有人知道如何用Python处理这种XML注解，我还是第一次见。

&lt;?link id="752760" resource-uuid="UUID-9f0575a3-1847-1cde-fd35-f18014fdecf3" resource-id="570935" resource-type="fork" type="ResourceLink"?&gt;

我需要查询这种 'element' 以获得 resource-uuid 值。

Answer 1

你必须区分处理指令和xml声明

两者的写法相同：<?SomeName SomeContent ?>.

找到details in section 2.6!

虽然 xml-declaration 必须排在首位并以 <?xml 开头，但其他处理指令可能（几乎）出现在 XML 中的任何位置。

处理指令必须有名称，而内容不像元素的内容那样受到形式上的限制。它是自由文本...

所以这是 well-formed XML:

<root>
  <a>test</a>
  <?piName some test?>
</root>

我不与 python 合作，但这会 return 你在 SQL-Server 的 PI:

DECLARE @xml XML=
N'<root>
    <a>test</a>
    <?link id="752760" resource-uuid="UUID-9f0575a3-1847-1cde-fd35-f18014fdecf3" resource-id="570935" resource-type="fork" type="ResourceLink"?>
  </root>';

SELECT @xml.query('/root/processing-instruction("link")');

即使您的内容看起来像属性：在 PI 中，内容是自由文本。所以你必须从内容中解析出你的信息...

This answer 可能对你有帮助。

Answer 2

如果您的处理器支持 XQuery 3.1，这里是解决问题的一种方法：

declare function local:values($pi) {
  map:merge(
    for $pair in tokenize($pi)
    let $key := substring-before($pair, '=')
    let $value := replace(substring-after($pair, '='), '^"|"$', '')
    return map:entry($key, $value)
  )
};

let $xml := document {
  <xml>
    <?link id="752760" resource-uuid="UUID-9f0575a3-1847-1cde-fd35-f18014fdecf3"
      resource-id="570935" resource-type="fork" type="ResourceLink"?>
  </xml>
}
for $pi in $xml//processing-instruction('link')
let $values := local:values($pi)
return $values?resource-uuid

旧版本 XQuery 的另一种解决方案：

let $xml := document {
  <xml>
    <?link id="752760" resource-uuid="UUID-9f0575a3-1847-1cde-fd35-f18014fdecf3"
      resource-id="570935" resource-type="fork" type="ResourceLink"?>
  </xml>
}
for $pi in $xml//processing-instruction('link')
for $pair in tokenize($pi, '\s+')[substring-before(., '=') = 'resource-uuid']
return replace(substring-after($pair, '='), '^"|"$', '')

这两个代码片段都假设您的处理指令中的值都是按照您的示例组成的（键和值用等号分隔，值用双引号引起来）。

Answer 3

您所指的"annotation"被称为processing instruction。

在处理指令中使用类似于 XML 元素属性的 keyword="value" 语法是很常见的，但不幸的是，这只是一个约定，而不是 XML 固有的东西，因此您必须自己解析内容以提取属性。（撒克逊有一个函数 saxon:get-pseudo-attribute() 用于此）。

如果您在 Python 中，那么在 Python 代码而不是 XPath 代码中执行这个额外的解析阶段可能更简单——除非您确实需要将该值作为某些更大的 XPath 的一部分表达式，在这种情况下，详细信息取决于您使用的是 XPath 还是 XQuery 以及哪个版本。

Answer 4

感谢大家，我了解了处理指令，并以此为基础，研究了如何处理它，如果有人需要的话，我会从头开始：

from lxml import etree

...

file = 'path/to/file.xml'
tree = etree.parse(file)
result = tree.xpath('//processing-instruction("link")')
for pi in result:
    # Each pi is a processing instruction tagged as 'link'
    if pi.get('type').__str__() == 'ResourceImport':
        # PI with type = ResourceImport
        print pi.text # Check the text of tis PI

使用lxml库很容易得到使用XPath的处理指令。

我希望这个代码片段对因为这个问题来到这里的人有所帮助。

如何处理“<?” xml 文件中的注释 Python

How to deal with "<?" annotation in xml file with Python

python

xml

xpath

xquery

processing-instruction