如何跳过验证 lxml 中的全局声明问题？

Question

如何跳过 Element 'baz': No matching global declaration available for the validation root., line 1 这个错误？

我需要验证一组通用的 XML/XSD 对，它们不一定以任何方式相似地组成，因此适用于特定 XML 结构的 hardcoded/literal 规则不适用.

XSD 由 GMC Inspire Designer 生成，它通常不是 XML 验证器，并且在检查语法方面非常“松散”。全局声明问题出现在我的本地验证器中，但由于其松散的性质而不会出现在 Inspire Designer 中。

如何指定 lxml 将产生的特定错误集并继续验证？

使用以下代码：

#get a list of all files in the working directory that are .xml files
xml_files_from_cwd = [xml_f for xml_f in listdir(my_path) if isfile(join(my_path, xml_f)) 
                      and xml_f.lower().endswith(".xml")]

xml_validator = etree.XMLSchema(file= my_path)

for xml in xml_files_from_cwd:
    recovering_parser = etree.XMLParser(recover=True)
    xml_file = etree.parse(my_path + "/" +xml, parser=recovering_parser)

    successful = False 
    try:
        successful = xml_validator.assertValid(xml_file)
    except Exception as e:
        print(f"File not valid: {e}")
    
    if successful:
        print(f"No errors detected in {xml}.")

我在验证 XML 文件时遇到问题，其中 XML 看起来像这样：

<baz>
  <bar BEGIN="1">
  ... [repeating elements here]
  </bar>
</baz>

和一个遵循这种格式的 XSD：

<?xml version="1.0" encoding="UTF-8"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
  <xsd:element name="foo">
    <xsd:complexType>
      <xsd:sequence minOccurs="1" maxOccurs="1">
        <xsd:element name="bar" minOccurs="1" maxOccurs="unbounded">
                  .... [repeating elements here]
        </xsd:element>
      </xsd:sequence>
    </xsd:complexType>
  </xsd:element>
</xsd:schema>

Answer 1

这个问题“我们能否继续验证超过初始失败条件的文件”这个问题的答案似乎是否定的，因为无法保证任何进一步的验证是否会产生超过 [=12= 的积极结果] 例。

Answer 2

这里的问题是验证依赖于整个文档的有效性。

例如，如果您的文件适用于：

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="foo">
    <xs:complexType>
       <xs:choice>
         <xs:element name="bar">
            <xs:complexType>
                <xs:choice>
                    <xs:element name="baz"/>
                    <xs:element name="qux"/>
                </xs:choice>
            </xs:complexType>
         </xs:element>
         <xs:element name="quux">
            <xs:complexType>
                <xs:sequence>
                    <xs:element name="qux"/>
                </xs:sequence>
            </xs:complexType>
         </xs:element>
       </xs:choice>    
    </xs:complexType>
  </xs:element>
</xs:schema>

这个文件会有问题：

<foo>
  <quuz>
    <qux/>
    ...
  </quuz>
</foo>

quuz 应该是 bar 还是 quux？

您可能可以从接下来的内容中看出，但是每次您运行遇到问题时都必须回溯到每个决定，然后在那个时候尝试另一个决定。

这很快就会变得非常复杂，因为某些东西是否有效可能取决于它的内容、结构、属性值等。很快，您将有太多的选项来测试它变得不可能 - 您甚至可以考虑一下选择数量几乎是无限的情况，因此您必须包含非常复杂的逻辑才能得出有效值。

在简单的情况下，例如您展示的示例中只有外部标记可能被错误命名，您可以简单地修复内存中的错误并重试验证。但这不是扩展到整个文档的方法。

注意：在现实生活场景中，您可能实际上知道并期望会发生什么，您可以遵循尝试验证的策略，如果失败，请反复解决问题，因为您 do 知道选项是什么，直到到达文档末尾。我的回答只是想指出这里没有通用的解决方案。

如何跳过验证 lxml 中的全局声明问题？

How do I skip validating global declaration issues in lxml?

python

lxml