XML 反对使用 XML 和 DTD

XML to object using XML and DTD

我有一个 XML 文件:

<?xml version="1.0"?>
<!DOCTYPE report SYSTEM "01.dtd" [
    <!ENTITY parameter "blablabla"> 
]>

<report xmlns="http://tempuri.org/report"
  details="Something is described &parameter;"
></report>

我试图将此 XML 解析为一个对象,但在详细信息反序列化后 属性 我得到了这个结果:“描述了一些内容 ¶meter;”

但我想得到这个结果:“描述了 blablabla”。

我的代码如下:

class Program
{
    static void Main(string[] args)
    {
        ReadXMLwithDTD();
    }

    public static void ReadXMLwithDTD()
    {
        XmlReaderSettings settings = new XmlReaderSettings();
        settings.XmlResolver = new XmlUrlResolver();
        settings.ValidationType = ValidationType.DTD;
        settings.DtdProcessing = DtdProcessing.Parse;
        settings.ValidationEventHandler += new ValidationEventHandler(ValidationCallBack);
        settings.IgnoreWhitespace = true;

        var files = Directory.GetFiles("../../../App_data/include/", "01.xml", SearchOption.AllDirectories);

        foreach (var file in files)
        {
            XmlDocument xmlDoc = new XmlDocument();

            using (StringReader sr = new StringReader(file))
            using (XmlReader reader = XmlReader.Create(sr, settings))
            {
                xmlDoc.Load(file);
            }

            report r = DeserializeToObject<report>(xmlDoc.OuterXml);
        }

        Console.ReadLine();
    }

    public static T DeserializeToObject<T>(string xml) where T : class
    {
        System.Xml.Serialization.XmlSerializer ser = new System.Xml.Serialization.XmlSerializer(typeof(T));

        MemoryStream memStream = new MemoryStream(Encoding.UTF8.GetBytes(xml));

        return (T)ser.Deserialize(memStream);
    }

    private static void ValidationCallBack(object sender, ValidationEventArgs e)
    {
        if (e.Severity == XmlSeverityType.Warning)
            Console.WriteLine("Warning: Matching schema not found.  No validation occurred." + e.Message);
        else // Error
            Console.WriteLine("Validation error: " + e.Message);
    }
}

我应该改变什么?

无需将您的 XML 加载到中间 XmlDocument。您可以在反序列化期间通过 XmlSerializer 扩展实体,只要您向序列化器传递一个配置为 DtdProcessing.Parse.

XmlReader

即如果我将你的反序列化代码概括如下:

public static partial class XmlSerializationHelper
{
    public static T LoadFromXmlWithDTD<T>(string filename, XmlSerializer serial = default, ValidationEventHandler validationCallBack = default)
    {
        var settings = new XmlReaderSettings
        {
            // This will throw an exception if uncommented:
            //   System.Xml.XmlException: An error has occurred while opening external DTD 'file:///app/01.dtd': Could not find file '/app/01.dtd'
            // XmlResolver = new XmlUrlResolver(), 
            DtdProcessing = DtdProcessing.Parse,
            IgnoreWhitespace = true,
        };
        settings.ValidationEventHandler += validationCallBack;
        serial = serial ?? new XmlSerializer(typeof(T));
        using (var reader = XmlReader.Create(filename, settings))
            return (T)serial.Deserialize(reader);
    }
}

您可以这样调用它:

var report = XmlSerializationHelper.LoadFromXmlWithDTD<report>(filename, validationCallBack: ValidationCallBack);

并且Details会被正确展开:

Assert.AreEqual("Something is described blablabla", report.Details);

备注:

  • 您可能想要设置 XmlReaderSettings.MaxCharactersFromEntities:

    This property allows you to mitigate denial of service attacks where the attacker submits XML documents that attempt to exceed memory limits via expanding entities. By limiting the characters that result from expanded entities, you can detect the attack and recover reliably.

  • 在下面的代码中:

    using (StringReader sr = new StringReader(file)) 
    using (XmlReader reader = XmlReader.Create(sr, settings))
    {
        xmlDoc.Load(file);
    }
    

    您创建了一个 XmlReader,它使用 StringReader 来解析文件名 file ,就好像它是一个 XML 字符串 而不是文件名字符串——然后您将忽略您创建的 reader 并使用 xmlDoc.Load(file); 按名称直接加载文件内容。这似乎忽略了您刚刚构造的 settings 并且可能是您的错误的直接原因。

  • 取消注释 XmlResolver = new XmlUrlResolver() 将引发异常 Could not find file '/app/01.dtd' 如果指定的外部 DTD 文件(不包含你的问题)没有找到。

演示 fiddle here.