如何使用 HttpClient 从网页获取 "Title"
How to get the "Title" from a webpage using HttpClient
我正在尝试使用 Apache HttpClient 4 从网页获取 "Title"。
编辑: 我的第一种方法是尝试从 header 获取它(使用 HttpHead)。如果那不可能,我如何从响应的 body 中获取它,正如@Todd 所说?
编辑 2:
<head>
[...]
<title>This is what I need to get!</title>
[...]
</head>
谢谢大家的评论。使用 jsoup 后解决方案非常简单。
Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();
考虑到我确实需要使用 HttpClient 进行连接,这就是我所拥有的:
org.jsoup.nodes.Document doc = null;
String title = "";
System.out.println("Getting content... ");
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpHost target = new HttpHost(host);
HttpGet httpget = new HttpGet(path);
CloseableHttpResponse response = httpclient.execute(target, httpget);
System.out.println("Parsing content... ");
try {
String line = null;
StringBuffer tmp = new StringBuffer();
BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
while ((line = in.readLine()) != null) {
String decoded = new String(line.getBytes(), "UTF-8");
tmp.append(" ").append(decoded);
}
doc = Jsoup.parse(String.valueOf(tmp));
title = doc.title();
System.out.println("Title=" + title); //<== ^_^
//[...]
} finally {
response.close();
}
System.out.println("Done.");
通过使用此代码片段,您仍然可以通过证明其 URL 来检索网页的 <title>
。
InputStream response = null;
try {
String url = "http://example.com/";
response = new URL(url).openStream();
Scanner scanner = new Scanner(response);
String responseBody = scanner.useDelimiter("\A").next();
System.out.println(responseBody.substring(responseBody.indexOf("<title>") + 7, responseBody.indexOf("</title>")));
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
response.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}
我正在尝试使用 Apache HttpClient 4 从网页获取 "Title"。
编辑: 我的第一种方法是尝试从 header 获取它(使用 HttpHead)。如果那不可能,我如何从响应的 body 中获取它,正如@Todd 所说?
编辑 2:
<head>
[...]
<title>This is what I need to get!</title>
[...]
</head>
谢谢大家的评论。使用 jsoup 后解决方案非常简单。
Document doc = Jsoup.connect("http://example.com/").get();
String title = doc.title();
考虑到我确实需要使用 HttpClient 进行连接,这就是我所拥有的:
org.jsoup.nodes.Document doc = null;
String title = "";
System.out.println("Getting content... ");
CloseableHttpClient httpclient = HttpClients.createDefault();
HttpHost target = new HttpHost(host);
HttpGet httpget = new HttpGet(path);
CloseableHttpResponse response = httpclient.execute(target, httpget);
System.out.println("Parsing content... ");
try {
String line = null;
StringBuffer tmp = new StringBuffer();
BufferedReader in = new BufferedReader(new InputStreamReader(response.getEntity().getContent()));
while ((line = in.readLine()) != null) {
String decoded = new String(line.getBytes(), "UTF-8");
tmp.append(" ").append(decoded);
}
doc = Jsoup.parse(String.valueOf(tmp));
title = doc.title();
System.out.println("Title=" + title); //<== ^_^
//[...]
} finally {
response.close();
}
System.out.println("Done.");
通过使用此代码片段,您仍然可以通过证明其 URL 来检索网页的 <title>
。
InputStream response = null;
try {
String url = "http://example.com/";
response = new URL(url).openStream();
Scanner scanner = new Scanner(response);
String responseBody = scanner.useDelimiter("\A").next();
System.out.println(responseBody.substring(responseBody.indexOf("<title>") + 7, responseBody.indexOf("</title>")));
} catch (IOException ex) {
ex.printStackTrace();
} finally {
try {
response.close();
} catch (IOException ex) {
ex.printStackTrace();
}
}