处理 RSS 时出现 XMLException

XMLException when processing RSS

我一直在尝试使用 Argotic 为我的新闻阅读器应用程序处理 RSS 提要。对于它们中的大多数,它工作正常,但在某些提要 (like this) 上,它会出现以下问题:

Additional information: For security reasons DTD is prohibited in this XML document. To enable DTD processing set the DtdProcessing property on XmlReaderSettings to Parse and pass the settings into XmlReader.Create method.

错误很简单,我传递了一个启用了 DtdProcessingXMLReaderSettings 对象。但是随后出现了以下内容:

An unhandled exception of type 'System.Xml.XmlException' occurred in System.Xml.dll Additional information: The ';' character, hexadecimal value 0x3B, cannot be included in a name. Line 9, position 366.

我使用的代码:

    XmlReaderSettings settings = new XmlReaderSettings();
    settings.IgnoreComments = true;
    settings.IgnoreWhitespace = true;
    settings.DtdProcessing = DtdProcessing.Parse;

    XmlReader reader = XmlReader.Create(this.url, settings);
    RssFeed feed = new RssFeed();
    feed.Load(reader);

我错过了什么?

异常告诉您 RSS 提要是非法的 - 具体来说,名称包含 ; 字符。 W3C specification 似乎禁止这样做:

Document authors are encouraged to use names which are meaningful words or combinations of words in natural languages, and to avoid symbolic or white space characters in names. Note that COLON, HYPHEN-MINUS, FULL STOP (period), LOW LINE (underscore), and MIDDLE DOT are explicitly permitted.

The ASCII symbols and punctuation marks, along with a fairly large group of Unicode symbol characters, are excluded from names

由于其他 RSS 阅读器也抱怨该提要可能无效。然而,在撰写本文时,W3C 验证器 shows it to be valid!

根据 XmlReaderSettings.ConformanceLevel, this issue will cause an exception whatever your ConformanceLevel, but you might find another setting in XmlReaderSettings which can turn the behaviour off (supply the settings to XmlReader.Create 的 MSDN 文档。否则,如果 Feed 无法修复,您将不得不对其进行一些预处理。

似乎忽略 DtdProcessing 解决了我的问题。

settings.DtdProcessing = DtdProcessing.Ignore;