在大多数情况下,是什么让 Jsoup 比 HttpURLConnection 和 HttpClient 更快
what makes Jsoup faster than HttpURLConnection & HttpClient in most cases
我想比较标题中提到的三个实现的性能,我写了一个 JAVA 小程序来帮助我做这件事。主要方法包含三个测试块,每个块如下所示:
nb=0; time=0;
for (int i = 0; i < 7; i++) {
double v = methodX(url);
if(v>0){
nb++;
time+=v;
}
}
if(nb==0) nb=1;
System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7");
变量nb
用于避免请求失败。现在方法 methodX
是 :
之一
private static double testWithNativeHUC(String url){
try {
HttpURLConnection httpURLConnection= (HttpURLConnection) new URL(url).openConnection();
httpURLConnection.addRequestProperty("User-Agent", UA);
long before = System.currentTimeMillis();
BufferedReader bufferedReader= new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
while (bufferedReader.readLine()!=null);
return System.currentTimeMillis()-before;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
private static double testWithHC(String url) {
try {
CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent(UA).build();
BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
long before = System.currentTimeMillis();
CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
basicResponseHandler.handleResponse(response);
return System.currentTimeMillis() - before;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
private static double testWithJsoup(String url){
try{
long before = System.currentTimeMillis();
Jsoup.connect(url).execute().parse();
return System.currentTimeMillis()-before;
}catch (IOException e){
e.printStackTrace();
return -1;
}
}
我得到的输出如下。
对于 url https://whosebug.com
:
HttpUrlConnection : 325.85714285714283. Tries 7/7
HttpClient : 299.0. Tries 7/7
Jsoup : 172.42857142857142. Tries 7/7
对于 url https://online.vfsglobal.dz
:
HttpUrlConnection : 104.57142857142857. Tries 7/7
HttpClient : 181.0. Tries 7/7
Jsoup : 57.857142857142854. Tries 7/7
对于 url https://google.com/
:
HttpUrlConnection : 251.28571428571428. Tries 7/7
HttpClient : 259.57142857142856. Tries 7/7
Jsoup : 299.85714285714283. Tries 7/7
对于 url https://algeria.blsspainvisa.com/book_appointment.php
:
HttpUrlConnection : 112.57142857142857. Tries 7/7
HttpClient : 194.85714285714286. Tries 7/7
Jsoup : 67.42857142857143. Tries 7/7
对于 url https://tunisia.blsspainvisa.com/book_appointment.php
:
HttpUrlConnection : 439.2857142857143. Tries 7/7
HttpClient : 283.42857142857144. Tries 7/7
Jsoup : 144.71428571428572. Tries 7/7
即使重复测试也会得到相同的结果,我没有在请求之间使用休眠时间来获得快速结果,我相信这对结果没有太大影响。
编辑
事实上,我分析了 Jsoup 的来源,它表明它使用 HttpURLConnection 和 BufferedInputStream,我尝试以 HttpURLConnection 方式使用两者,但结果相同,如您所见,区别很明显,Jsoup 似乎明显比 HttpURLConnection 快它使用 HttpURLConnection !
提前致谢,
您的基准没有意义。
我为这三个库编写了一个微基准测试,结果没有显着差异。
Benchmark Mode Cnt Score Error Units
HttpBenchmark.httpClientGoogle avgt 2 151.162 ms/op
HttpBenchmark.httpClientWhosebug avgt 2 151.086 ms/op
HttpBenchmark.httpUrlConnectionGoogle avgt 2 235.869 ms/op
HttpBenchmark.httpUrlConnectionWhosebug avgt 2 145.162 ms/op
HttpBenchmark.jsoupGoogle avgt 2 391.162 ms/op
HttpBenchmark.jsoupWhosebug avgt 2 188.059 ms/op
你的测试和我的测试只有一点点不同:
- JSoup 设置 header "Accept-Encoding", "gzip" 这将减少带宽
- JSoup 使用更大的缓冲区 (32kb)
- 需要重用HttpClient
在我的测试中,JSoup 是最慢的。当然只有 JSoup 解析响应。
我的基准:
@Warmup(iterations = 1, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Threads(1)
public class HttpBenchmark {
private static final String GOOGLE = "https://google.com/";
private static final String Whosebug = "https://whosebug.com";
private final CloseableHttpClient httpClient = HttpClientBuilder.create().build();
@Benchmark
public void httpClientGoogle() throws Exception {
httpClient(GOOGLE);
}
@Benchmark
public void httpClientWhosebug() throws Exception {
httpClient(Whosebug);
}
@Benchmark
public void httpUrlConnectionGoogle() throws Exception {
httpUrlConnection(GOOGLE);
}
@Benchmark
public void httpUrlConnectionWhosebug() throws Exception {
httpUrlConnection(Whosebug);
}
@Benchmark
public void jsoupGoogle() throws Exception {
jsoup(GOOGLE);
}
@Benchmark
public void jsoupWhosebug() throws Exception {
jsoup(Whosebug);
}
private void httpClient(final String url) throws Exception {
final CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
final BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
basicResponseHandler.handleResponse(response);
response.close();
}
private void httpUrlConnection(final String url) throws Exception {
final HttpURLConnection httpURLConnection = (HttpURLConnection) new URL(url).openConnection();
httpURLConnection.addRequestProperty("Accept-Encoding", "gzip");
try (final BufferedInputStream r = new BufferedInputStream(httpURLConnection.getInputStream())) {
final byte[] tmp = new byte[1024 * 32];
int read;
while (true) {
read = r.read(tmp);
if (read == -1) {
break;
}
}
}
}
private void jsoup(final String url) throws Exception {
Jsoup.connect(url).execute().parse();
}
}
我想比较标题中提到的三个实现的性能,我写了一个 JAVA 小程序来帮助我做这件事。主要方法包含三个测试块,每个块如下所示:
nb=0; time=0;
for (int i = 0; i < 7; i++) {
double v = methodX(url);
if(v>0){
nb++;
time+=v;
}
}
if(nb==0) nb=1;
System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7");
变量nb
用于避免请求失败。现在方法 methodX
是 :
private static double testWithNativeHUC(String url){
try {
HttpURLConnection httpURLConnection= (HttpURLConnection) new URL(url).openConnection();
httpURLConnection.addRequestProperty("User-Agent", UA);
long before = System.currentTimeMillis();
BufferedReader bufferedReader= new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
while (bufferedReader.readLine()!=null);
return System.currentTimeMillis()-before;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
private static double testWithHC(String url) {
try {
CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent(UA).build();
BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
long before = System.currentTimeMillis();
CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
basicResponseHandler.handleResponse(response);
return System.currentTimeMillis() - before;
} catch (IOException e) {
e.printStackTrace();
return -1;
}
}
private static double testWithJsoup(String url){
try{
long before = System.currentTimeMillis();
Jsoup.connect(url).execute().parse();
return System.currentTimeMillis()-before;
}catch (IOException e){
e.printStackTrace();
return -1;
}
}
我得到的输出如下。
对于 url https://whosebug.com
:
HttpUrlConnection : 325.85714285714283. Tries 7/7
HttpClient : 299.0. Tries 7/7
Jsoup : 172.42857142857142. Tries 7/7
对于 url https://online.vfsglobal.dz
:
HttpUrlConnection : 104.57142857142857. Tries 7/7
HttpClient : 181.0. Tries 7/7
Jsoup : 57.857142857142854. Tries 7/7
对于 url https://google.com/
:
HttpUrlConnection : 251.28571428571428. Tries 7/7
HttpClient : 259.57142857142856. Tries 7/7
Jsoup : 299.85714285714283. Tries 7/7
对于 url https://algeria.blsspainvisa.com/book_appointment.php
:
HttpUrlConnection : 112.57142857142857. Tries 7/7
HttpClient : 194.85714285714286. Tries 7/7
Jsoup : 67.42857142857143. Tries 7/7
对于 url https://tunisia.blsspainvisa.com/book_appointment.php
:
HttpUrlConnection : 439.2857142857143. Tries 7/7
HttpClient : 283.42857142857144. Tries 7/7
Jsoup : 144.71428571428572. Tries 7/7
即使重复测试也会得到相同的结果,我没有在请求之间使用休眠时间来获得快速结果,我相信这对结果没有太大影响。
编辑 事实上,我分析了 Jsoup 的来源,它表明它使用 HttpURLConnection 和 BufferedInputStream,我尝试以 HttpURLConnection 方式使用两者,但结果相同,如您所见,区别很明显,Jsoup 似乎明显比 HttpURLConnection 快它使用 HttpURLConnection !
提前致谢,
您的基准没有意义。
我为这三个库编写了一个微基准测试,结果没有显着差异。
Benchmark Mode Cnt Score Error Units
HttpBenchmark.httpClientGoogle avgt 2 151.162 ms/op
HttpBenchmark.httpClientWhosebug avgt 2 151.086 ms/op
HttpBenchmark.httpUrlConnectionGoogle avgt 2 235.869 ms/op
HttpBenchmark.httpUrlConnectionWhosebug avgt 2 145.162 ms/op
HttpBenchmark.jsoupGoogle avgt 2 391.162 ms/op
HttpBenchmark.jsoupWhosebug avgt 2 188.059 ms/op
你的测试和我的测试只有一点点不同:
- JSoup 设置 header "Accept-Encoding", "gzip" 这将减少带宽
- JSoup 使用更大的缓冲区 (32kb)
- 需要重用HttpClient
在我的测试中,JSoup 是最慢的。当然只有 JSoup 解析响应。
我的基准:
@Warmup(iterations = 1, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Threads(1)
public class HttpBenchmark {
private static final String GOOGLE = "https://google.com/";
private static final String Whosebug = "https://whosebug.com";
private final CloseableHttpClient httpClient = HttpClientBuilder.create().build();
@Benchmark
public void httpClientGoogle() throws Exception {
httpClient(GOOGLE);
}
@Benchmark
public void httpClientWhosebug() throws Exception {
httpClient(Whosebug);
}
@Benchmark
public void httpUrlConnectionGoogle() throws Exception {
httpUrlConnection(GOOGLE);
}
@Benchmark
public void httpUrlConnectionWhosebug() throws Exception {
httpUrlConnection(Whosebug);
}
@Benchmark
public void jsoupGoogle() throws Exception {
jsoup(GOOGLE);
}
@Benchmark
public void jsoupWhosebug() throws Exception {
jsoup(Whosebug);
}
private void httpClient(final String url) throws Exception {
final CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
final BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
basicResponseHandler.handleResponse(response);
response.close();
}
private void httpUrlConnection(final String url) throws Exception {
final HttpURLConnection httpURLConnection = (HttpURLConnection) new URL(url).openConnection();
httpURLConnection.addRequestProperty("Accept-Encoding", "gzip");
try (final BufferedInputStream r = new BufferedInputStream(httpURLConnection.getInputStream())) {
final byte[] tmp = new byte[1024 * 32];
int read;
while (true) {
read = r.read(tmp);
if (read == -1) {
break;
}
}
}
}
private void jsoup(final String url) throws Exception {
Jsoup.connect(url).execute().parse();
}
}