使用 Syndicationfeed 加载 XML 编码链接
Use Syndicationfeed to load XML with encoded links
我正在使用以下代码阅读 RSS:
var reader = XmlReader.Create(url);
SyndicationFeed.Load(reader);
RSS 看起来像这样,当 link
标签包含编码字符时 SyndicationFeed.Load
将抛出异常(在本例中 å
编码为 %C3%A5
)
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<atom:link rel="self" type="application/rss+xml" href="http://example.com/rss" />
<title>My RSS</title>
<description>My RSS</description>
<pubDate>Mon, 04 Jul 2016 08:19:50 +0200</pubDate>
<generator>RSS Generator 1.1</generator>
<link>http://example.com/rss</link>
<item>
<title>A title</title>
<description>A description</description>
<link>http://bl%C3%A5ljus.se</link>
</item>
</channel>
</rss>
例外情况如下:
System.Xml.XmlException: Error in line x position x. An error was encountered when parsing the item's XML. Refer to the inner exception for more details. --->
System.UriFormatException: Invalid URI: The hostname could not be parsed.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString, UriKind uriKind)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
--- End of inner exception stack trace ---
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItem(XmlReader reader, SyndicationFeed feed)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItems(XmlReader reader, SyndicationFeed feed, Boolean& areAllItemsRead)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFeed(XmlReader reader)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader)
at System.ServiceModel.Syndication.SyndicationFeed.Load[TSyndicationFeed](XmlReader reader)
System.UriFormatException: Invalid URI: The hostname could not be parsed.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString, UriKind uriKind)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
加载 XML 时是否可以通过任何设置告诉 SyndicationFeed
忽略解析错误?或者其他一些解决方案?
问题似乎是 Uri
的创建 - 您可以仅使用此代码重现:
var uri = new Uri("http://bl%C3%A5jus.se");
一个可能的解决方案是在加载为 SyndicationFeed
.
之前预处理 XML 以解码 link url
var doc = XDocument.Load(url);
foreach (var link in doc.Descendants("link")
{
link.Value = WebUtility.UrlDecode(link.Value);
}
using (var reader = doc.CreateReader())
{
SyndicationFeed.Load(reader);
}
我正在使用以下代码阅读 RSS:
var reader = XmlReader.Create(url);
SyndicationFeed.Load(reader);
RSS 看起来像这样,当 link
标签包含编码字符时 SyndicationFeed.Load
将抛出异常(在本例中 å
编码为 %C3%A5
)
<?xml version="1.0" encoding="utf-8" standalone="yes"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
<channel>
<atom:link rel="self" type="application/rss+xml" href="http://example.com/rss" />
<title>My RSS</title>
<description>My RSS</description>
<pubDate>Mon, 04 Jul 2016 08:19:50 +0200</pubDate>
<generator>RSS Generator 1.1</generator>
<link>http://example.com/rss</link>
<item>
<title>A title</title>
<description>A description</description>
<link>http://bl%C3%A5ljus.se</link>
</item>
</channel>
</rss>
例外情况如下:
System.Xml.XmlException: Error in line x position x. An error was encountered when parsing the item's XML. Refer to the inner exception for more details. --->
System.UriFormatException: Invalid URI: The hostname could not be parsed.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString, UriKind uriKind)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
--- End of inner exception stack trace ---
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItem(XmlReader reader, SyndicationFeed feed)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItems(XmlReader reader, SyndicationFeed feed, Boolean& areAllItemsRead)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadXml(XmlReader reader, SyndicationFeed result)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFeed(XmlReader reader)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadFrom(XmlReader reader)
at System.ServiceModel.Syndication.SyndicationFeed.Load[TSyndicationFeed](XmlReader reader)
System.UriFormatException: Invalid URI: The hostname could not be parsed.
at System.Uri.CreateThis(String uri, Boolean dontEscape, UriKind uriKind)
at System.Uri..ctor(String uriString, UriKind uriKind)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadAlternateLink(XmlReader reader, Uri baseUri)
at System.ServiceModel.Syndication.Rss20FeedFormatter.ReadItemFrom(XmlReader reader, SyndicationItem result, Uri feedBaseUri)
加载 XML 时是否可以通过任何设置告诉 SyndicationFeed
忽略解析错误?或者其他一些解决方案?
问题似乎是 Uri
的创建 - 您可以仅使用此代码重现:
var uri = new Uri("http://bl%C3%A5jus.se");
一个可能的解决方案是在加载为 SyndicationFeed
.
var doc = XDocument.Load(url);
foreach (var link in doc.Descendants("link")
{
link.Value = WebUtility.UrlDecode(link.Value);
}
using (var reader = doc.CreateReader())
{
SyndicationFeed.Load(reader);
}