在 XML 验证期间访问架构信息

Accessing schema information during XML validation

我有一些错误 XML 无法根据其架构进行验证。这些错误几乎都是一样的——违反文档模型的空元素——但它们可能发生在文档中的数百个不同元素上。

我打算的解决方案是验证文档,从 XElement 列表中生成的异常对象(如果有)的 SourceObject 属性 捕获有问题的空元素,然后从文档中删除这些元素。但是,SourceObject 属性 始终为空。

阅读相关内容后,我了解到文档对象在验证发生之前不会填充模式信息。但是,考虑到这一点,我仍然无法从验证过程中获得任何有用的信息,因为相关的对象属性始终为空,无论我何时尝试访问它们。

这是我目前的情况:

public void FixXml(string xmlDoc)
{
    XDocument doc = XDocument.Parse(xmlDoc);
    XmlSchemaSet schema = new XmlSchemaSet();
    schema.Add("", @"../../test.xsd");
    schema.Compile();

    doc.Validate(schema, (Callback));

    foreach (XElement element in errors)
    {
        // This is where I'd start making changes to the document if the list didn't contain a bunch of nulls.
    }
}

回调方法: (当我确信代码可以正常工作时,我可能会将其填充到 lambda 中)。

private void Callback(object sender, ValidationEventArgs eventArgs)
{
    XmlSchemaValidationException ex = (eventArgs.Exception as XmlSchemaValidationException);

    if (ex != null)
    {
        XElement element = (ex.SourceObject as XElement);
        errors.Add(element);
    }
}

This question 及其答案对我很有用,我已经将部分解决方案应用到我自己的项目中,但它似乎仍然不起作用。我觉得我在这里遗漏了一些明显而愚蠢的东西。

XmlSchemaValidationException.SourceObject is null is explained in the docs

的原因

When an XmlSchemaValidationException is thrown during validation of a class that implements the IXPathNavigable interface such as the XPathNavigator or XmlNode class, the object returned by the SourceObject property is an instance of a class that implements the IXPathNavigable interface.

When an XmlSchemaValidationException is thrown during validation by a validating XmlReader object, the value of the SourceObject property is null.

不幸的是,XDocument does not implement IXPathNavigableSourceObjectnull

如果您只需要 SourceObject,您可以像这样创建调用 Extensions.CreateNavigator(this XNode node) to create a navigator for your document, then validate using XPathNavigator.CheckValidity(XmlSchemaSet, ValidationEventHandler)

var errors = new List<XmlSchemaValidationException>();

ValidationEventHandler callback = (sender, args) =>
{
    var exception = (args.Exception as XmlSchemaValidationException);
    if (exception != null)
    {
        errors.Add(exception);
    }
};          

var navigator = doc.CreateNavigator();
navigator.CheckValidity(schema, callback);          

foreach (var exception in errors)
{
    var node = (XObject)exception.SourceObject;

    // Do something with the node.
    Console.WriteLine();
    Console.WriteLine(exception);
    Console.WriteLine("{0}: {1}", node.GetType(), node.ToString());
    Assert.IsTrue(node != null, "node != null");
}

但是,实验表明 XmlSchemaException.SourceSchemaObject always seems to be null with this approach, and also XElement.IXmlSerializable.GetSchema() is not populated. I'm not sure why the source schema object is not passed in, but testing in .NET Core 3.0.0 shows it is not. (Possibly this is related to Issue #38748: XSD Validation Errors- Missing details on xsd schema error code 由于当前未实施而被关闭。)

如果您还需要源架构对象,则需要遵循 documentation for Extensions.GetSchemaInfo() and validate the XDocument using XDocument.Validate(XDocument, XmlSchemaSet, ValidationEventHandler, Boolean addSchemaInfo). This populates the schema information into the LINQ to XML tree -- but, sadly, prevents SourceObject from being set. Instead, when errors are detected, you will need to traverse the XElement hierarchy looking for elements and attributes for which GetSchemaInfo() returns an IXmlSchemaInfo for which Validity is not Valid:

中的方法
var errors = new List<XmlSchemaValidationException>();

ValidationEventHandler callback = (sender, args) =>
{
    var exception = (args.Exception as XmlSchemaValidationException);
    if (exception != null)
    {
        errors.Add(exception);
    }
};          

doc.Validate(schema, callback, true);           

foreach (var exception in errors)
{
    // Handle the exception itself.
    Console.WriteLine(exception);
}

if (errors.Count > 0)
{
    // If there were any errors, traverse the entire document looking for invalid nodes:
    DumpInvalidNodes(doc.Root);
}

示例方法 DumpInvalidNodesMicrosoft docs

修改而来
//Taken from https://docs.microsoft.com/en-us/dotnet/api/system.xml.schema.extensions.getschemainfo?view=netframework-4.8#System_Xml_Schema_Extensions_GetSchemaInfo_System_Xml_Linq_XElement_
//with an added null check:
static void DumpInvalidNodes(XElement el)  
{  
    if (el.GetSchemaInfo().Validity != XmlSchemaValidity.Valid)  
        Console.WriteLine("Invalid Element {0}",  
            el.AncestorsAndSelf()  
            .InDocumentOrder()  
            .Aggregate("", (s, i) => s + "/" + i.Name.ToString()));  
    foreach (XAttribute att in el.Attributes())  
    {
        var si = att.GetSchemaInfo();

        // MUST CHECK FOR NULL HERE
        // Because w3 standard attributes like xmlns:xsi will have null SchemaInfo
        // when not included in the schema, rather than being reported as Invalid.
        if (si != null && si.Validity != XmlSchemaValidity.Valid)  
            Console.WriteLine("Invalid Attribute {0}",  
                att  
                .Parent  
                .AncestorsAndSelf()  
                .InDocumentOrder()  
                .Aggregate("",  
                    (s, i) => s + "/" + i.Name.ToString()) + "/@" + att.Name.ToString()  
                );  
    }
    foreach (XElement child in el.Elements())  
        DumpInvalidNodes(child);  
}

请注意,我的测试表明需要修改文档代码以检查 XAttribute.GetSchemaInfo() 返回 null。当未明确包含在模式中时,这似乎发生在 xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 等 w3c 标准属性上。

演示 fiddle #2 here.

更新:似乎doc.CreateNavigator().CheckValidity(schema, callback)不适用于较早版本的Full Framework;例如在 .Net 4.7 上抛出异常 System.NotSupportedException: This XPathNavigator does not support XSD validation。演示 fiddle #3 here。如果您遇到这个问题,您将不得不使用第二种方法。