Java - 罗马:我正在尝试解析 RSS 提要,但在某些频道上出现错误
Java - Rome: I am trying to parse RSS feed but get a error on some channels
我正在尝试使用 RSS 并对其进行解析。我找到了罗马,我正尝试通过代码使用它:
private SyndFeed parseFeed(String url) throws IllegalArgumentException, FeedException, IOException {
return new SyndFeedInput().build(new XmlReader(new URL(url)));
}
public Boolean processRSSContent(String url) {
try {
SyndFeed theFeed = this.parseFeed(url);
SyndEntry entry = theFeed.getEntries().get(0);
ZonedDateTime entryUtcDate = ZonedDateTime.ofInstant(entry.getPublishedDate().toInstant(), ZoneOffset.UTC);
String entryTitle = entry.getTitle();
String entryText = entry.getDescription().getValue();
}
catch (ParsingFeedException e) {
e.printStackTrace();
return false;
}
catch (FeedException e) {
e.printStackTrace();
return false;
}
catch (IOException e) {
e.printStackTrace();
return false;
}
}
在 http://feeds.bbci.co.uk/news/world/rss.xml everything works fine, but on some other channels like http://habrahabr.ru/rss/ 等某些频道上,我收到错误消息:
Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
我看了一下这个link后面的内容,XML真是奇怪。但它是一个受欢迎的网站,我在其他一些网站上遇到了这个错误,所以我不认为 XML 有问题。我做错了什么?如何阅读此 RSS 频道?
如果你把url http://habrahabr.ru/rss/ to your browser, you'll notice that it redirects to https://habrahabr.ru/rss/interesting。您的代码不处理重定向。
我建议您使用 HttpClientFeedFetcher from rome-fetcher 模块,它处理重定向并具有其他高级功能(缓存、条件 GET、压缩):
HttpClientFeedFetcher feedFetcher = new HttpClientFeedFetcher();
try {
SyndFeed feed = feedFetcher.retrieveFeed(new URL("http://habrahabr.ru/rss/"));
System.out.println(feed.getLink());
} catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
e.printStackTrace();
}
编辑:Rome-fetcher is being deprecated,但可以改用 Apache HttpClient,它更灵活。
我正在尝试使用 RSS 并对其进行解析。我找到了罗马,我正尝试通过代码使用它:
private SyndFeed parseFeed(String url) throws IllegalArgumentException, FeedException, IOException {
return new SyndFeedInput().build(new XmlReader(new URL(url)));
}
public Boolean processRSSContent(String url) {
try {
SyndFeed theFeed = this.parseFeed(url);
SyndEntry entry = theFeed.getEntries().get(0);
ZonedDateTime entryUtcDate = ZonedDateTime.ofInstant(entry.getPublishedDate().toInstant(), ZoneOffset.UTC);
String entryTitle = entry.getTitle();
String entryText = entry.getDescription().getValue();
}
catch (ParsingFeedException e) {
e.printStackTrace();
return false;
}
catch (FeedException e) {
e.printStackTrace();
return false;
}
catch (IOException e) {
e.printStackTrace();
return false;
}
}
在 http://feeds.bbci.co.uk/news/world/rss.xml everything works fine, but on some other channels like http://habrahabr.ru/rss/ 等某些频道上,我收到错误消息:
Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
com.rometools.rome.io.ParsingFeedException: Invalid XML: Error on line 5: The element type "meta" must be terminated by the matching end-tag "</meta>".
我看了一下这个link后面的内容,XML真是奇怪。但它是一个受欢迎的网站,我在其他一些网站上遇到了这个错误,所以我不认为 XML 有问题。我做错了什么?如何阅读此 RSS 频道?
如果你把url http://habrahabr.ru/rss/ to your browser, you'll notice that it redirects to https://habrahabr.ru/rss/interesting。您的代码不处理重定向。
我建议您使用 HttpClientFeedFetcher from rome-fetcher 模块,它处理重定向并具有其他高级功能(缓存、条件 GET、压缩):
HttpClientFeedFetcher feedFetcher = new HttpClientFeedFetcher();
try {
SyndFeed feed = feedFetcher.retrieveFeed(new URL("http://habrahabr.ru/rss/"));
System.out.println(feed.getLink());
} catch (IllegalArgumentException | IOException | FeedException | FetcherException e) {
e.printStackTrace();
}
编辑:Rome-fetcher is being deprecated,但可以改用 Apache HttpClient,它更灵活。