sgml/xml 中的参数实体引用是否可以使用 .NET 解析?

Are parameter entity references in sgml/xml parsible using .NET?

当我尝试使用 XDocument 解析以下数据时,出现以下错误:

"XMLException: A parameter entity reference is not allowed in internal markup"

这是我要解析的示例数据:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY % std       "standard SGML">
  <!ENTITY % signature " &#x2014; &author;.">
  <!ENTITY % question  "Why couldn&#x2019;t I publish my books directly in %std;?">
  <!ENTITY % author    "William Shakespeare">
]>
<sgml>&question;&signature;</sgml>

下面是试图解析上述文件的代码:

string caFile = @"pathToFile";
using (var caStream = File.Open(caFile, FileMode.Open, FileAccess.Read))
{
    var caDoc = XDocument.Load(caStream); // Exception thrown here!
}

有没有办法让内置的 .NET xml 解析库来处理实体引用,或者至少忽略嵌入的 !Doctype 并解析根元素?

注意:我假设参数实体引用在 XML 中有效。 (see here)

这里有一些问题,但主要是您应该使用 General Entities

  1. 您正在将您的实体定义为参数实体。这些基本上是 仅在 DTD 本身 内部使用的宏。来自 XML Specification:

    Parameter-entity references MUST NOT appear outside the DTD.

    来自XML in a Nutshell 2nd Edition

    It would be preferable to define a constant that can hold the common parts of the content specification for all five kinds of listings and refer to that constant from inside the content specification of each element. ...

    An entity reference is the obvious candidate here. However, general entity references are not allowed to provide replacement text for a content specification or attribute list, only for parts of the DTD that will be included in the XML document itself. Instead, XML provides a new construct exclusively for use inside DTDs, the parameter entity, which is referred to by a parameter entity reference. Parameter entities behave like and are declared almost exactly like a general entity. However, they use a % instead of an &, and they can only be used in a DTD while general entities can only be used in the document content.

    但是,您的 XML 指的是其 文档内容 中的实体。这表明您应该使用 general entities 而不是参数实体。

  2. 您的一个参数实体 %question 在其替换文本中嵌入了对另一个参数实体 %std; 的引用。 XML Specification:

    明确禁止这样做

    In the internal DTD subset, parameter-entity references MUST NOT occur within markup declarations; they may occur where markup declarations can occur. (This does not apply to references that occur in external parameter entities or to the external subset.)

    看来你应该使用一般实体而不是参数实体,因为前者可以使用 "inside the DTD in places where they will eventually be included in the body of an XML document, for instance ... in the replacement text of another entity."

  3. 您需要通过设置XmlReaderSettings.ProhibitDtd = false (.Net 3.5) or XmlReaderSettings.DtdProcessing = DtdProcessing.Parse(以后的版本)来启用DTD处理。

将这些放在一起,代码如下:

    string xmlGood = @"<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY std       ""standard SGML"">
  <!ENTITY signature "" &#x2014; &author;."">
  <!ENTITY question  ""Why couldn&#x2019;t I publish my books directly in &std;?"">
  <!ENTITY author    ""William Shakespeare"">
]>
<sgml>&question;&signature;</sgml>";

    var settings = new XmlReaderSettings { DtdProcessing = DtdProcessing.Parse };

    using (var sr = new StringReader(xmlGood))
    using (var xmlReader = XmlReader.Create(sr, settings))
    {
        var doc = XDocument.Load(xmlReader);
        Console.WriteLine(doc);
    }               

产生以下输出:

<!DOCTYPE sgml [
  <!ELEMENT sgml ANY>
  <!ENTITY std       "standard SGML">
  <!ENTITY signature " — &author;.">
  <!ENTITY question  "Why couldn’t I publish my books directly in &std;?">
  <!ENTITY author    "William Shakespeare">
]>
<sgml>Why couldn’t I publish my books directly in standard SGML? — William Shakespeare.</sgml>

如您所见,一般实体已被解析和扩展。