Java:传递用户代理变量以从网络服务器获取 RSS 数据
Java: Passing user-agent vars to fetch RSS data from webserver
我一直在尝试获取 rss feed through java and I keep getting a 403 error. I searched around and it's apparently due to empty user-agent 变量。
这是我目前尝试过的方法:
try {
url = new URL("http://*****.com/feed/");
InputStream is = null;
try {
URLConnection con = url.openConnection();
con.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
con.connect();
is = con.getInputStream();
feed = FeedParser.parse(con.getURL());
} catch (IOException e) {
System.out.println("error");
try
{
throw e;
}
catch (IOException e1)
{
// TODO Auto-generated catch block
e1.printStackTrace();
}
} finally {
if( is != null)
try
{
is.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (FeedIOException e) {
e.printStackTrace();
} catch (FeedXMLParseException e) {
e.printStackTrace();
} catch (UnsupportedFeedException e) {
e.printStackTrace();
}
int items = feed.getItemCount();
for (int i = 1; i <= items; i++) {
FeedItem item = feed.getItem(i-1);
System.out.println(i+" Title: " + item.getTitle());
}
我在让它工作时遇到了问题,我确信我做的不正确。我用来解析 RSS 提要的库是 feed4j.
提前致谢。
Feed4j 不支持设置请求属性。所以你不能这样做,除非你将 FeedParser class 修改为这样的东西
public static Feed parse(URL url, String userAgent) throws IOException, FeedIOException, FeedXMLParseException, UnsupportedFeedException {
try {
URLConnection con = url.openConnection();
if (userAgent != null) {
con.addRequestProperty("User-Agent", userAgent);
}
con.connect();
InputStream is = con.getInputStream();
SAXReader saxReader = new SAXReader();
Document document = saxReader.read(is);
int code = FeedRecognizer.recognizeFeed(document);
switch (code) {
case FeedRecognizer.RSS_1_0:
return TypeRSS_1_0.feed(url, document);
case FeedRecognizer.RSS_2_0:
return TypeRSS_2_0.feed(url, document);
case FeedRecognizer.ATOM_0_3:
return TypeAtom_0_3.feed(url, document);
case FeedRecognizer.ATOM_1_0:
return TypeAtom_1_0.feed(url, document);
default:
throw new UnsupportedFeedException();
}
} catch (DocumentException e) {
throw new FeedXMLParseException(e);
}
}
也在github
我一直在尝试获取 rss feed through java and I keep getting a 403 error. I searched around and it's apparently due to empty user-agent 变量。
这是我目前尝试过的方法:
try {
url = new URL("http://*****.com/feed/");
InputStream is = null;
try {
URLConnection con = url.openConnection();
con.addRequestProperty("User-Agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0)");
con.connect();
is = con.getInputStream();
feed = FeedParser.parse(con.getURL());
} catch (IOException e) {
System.out.println("error");
try
{
throw e;
}
catch (IOException e1)
{
// TODO Auto-generated catch block
e1.printStackTrace();
}
} finally {
if( is != null)
try
{
is.close();
}
catch (IOException e)
{
// TODO Auto-generated catch block
e.printStackTrace();
}
}
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (FeedIOException e) {
e.printStackTrace();
} catch (FeedXMLParseException e) {
e.printStackTrace();
} catch (UnsupportedFeedException e) {
e.printStackTrace();
}
int items = feed.getItemCount();
for (int i = 1; i <= items; i++) {
FeedItem item = feed.getItem(i-1);
System.out.println(i+" Title: " + item.getTitle());
}
我在让它工作时遇到了问题,我确信我做的不正确。我用来解析 RSS 提要的库是 feed4j.
提前致谢。
Feed4j 不支持设置请求属性。所以你不能这样做,除非你将 FeedParser class 修改为这样的东西
public static Feed parse(URL url, String userAgent) throws IOException, FeedIOException, FeedXMLParseException, UnsupportedFeedException {
try {
URLConnection con = url.openConnection();
if (userAgent != null) {
con.addRequestProperty("User-Agent", userAgent);
}
con.connect();
InputStream is = con.getInputStream();
SAXReader saxReader = new SAXReader();
Document document = saxReader.read(is);
int code = FeedRecognizer.recognizeFeed(document);
switch (code) {
case FeedRecognizer.RSS_1_0:
return TypeRSS_1_0.feed(url, document);
case FeedRecognizer.RSS_2_0:
return TypeRSS_2_0.feed(url, document);
case FeedRecognizer.ATOM_0_3:
return TypeAtom_0_3.feed(url, document);
case FeedRecognizer.ATOM_1_0:
return TypeAtom_1_0.feed(url, document);
default:
throw new UnsupportedFeedException();
}
} catch (DocumentException e) {
throw new FeedXMLParseException(e);
}
}
也在github