在大多数情况下,是什么让 Jsoup 比 HttpURLConnection 和 HttpClient 更快

what makes Jsoup faster than HttpURLConnection & HttpClient in most cases

我想比较标题中提到的三个实现的性能,我写了一个 JAVA 小程序来帮助我做这件事。主要方法包含三个测试块,每个块如下所示:

        nb=0; time=0;
        for (int i = 0; i < 7; i++) {
            double v = methodX(url);
            if(v>0){
                nb++;
                time+=v;
            }
        }
        if(nb==0) nb=1;
        System.out.println("HttpClient : "+(time/ ((double) nb))+". Tries "+nb+"/7");

变量nb用于避免请求失败。现在方法 methodX 是 :

之一
    private static double testWithNativeHUC(String url){
        try {
            HttpURLConnection httpURLConnection= (HttpURLConnection) new URL(url).openConnection();
            httpURLConnection.addRequestProperty("User-Agent", UA);
            long before = System.currentTimeMillis();
            BufferedReader bufferedReader= new BufferedReader(new InputStreamReader(httpURLConnection.getInputStream()));
            while (bufferedReader.readLine()!=null);
            return System.currentTimeMillis()-before;
        } catch (IOException e) {
            e.printStackTrace();
            return -1;
        }
    }

    private static double testWithHC(String url) {
        try {
            CloseableHttpClient httpClient = HttpClientBuilder.create().setUserAgent(UA).build();
            BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
            long before = System.currentTimeMillis();
            CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
            basicResponseHandler.handleResponse(response);
            return System.currentTimeMillis() - before;
        } catch (IOException e) {
            e.printStackTrace();
            return -1;
        }
    }

    private static double testWithJsoup(String url){
        try{
            long before = System.currentTimeMillis();
            Jsoup.connect(url).execute().parse();
            return System.currentTimeMillis()-before;
        }catch (IOException e){
            e.printStackTrace();
            return -1;
        }
    }

我得到的输出如下。

对于 url https://whosebug.com :

    HttpUrlConnection : 325.85714285714283. Tries 7/7
    HttpClient : 299.0. Tries 7/7
    Jsoup : 172.42857142857142. Tries 7/7

对于 url https://online.vfsglobal.dz :

    HttpUrlConnection : 104.57142857142857. Tries 7/7
    HttpClient : 181.0. Tries 7/7
    Jsoup : 57.857142857142854. Tries 7/7

对于 url https://google.com/ :

    HttpUrlConnection : 251.28571428571428. Tries 7/7
    HttpClient : 259.57142857142856. Tries 7/7
    Jsoup : 299.85714285714283. Tries 7/7

对于 url https://algeria.blsspainvisa.com/book_appointment.php :

    HttpUrlConnection : 112.57142857142857. Tries 7/7
    HttpClient : 194.85714285714286. Tries 7/7
    Jsoup : 67.42857142857143. Tries 7/7

对于 url https://tunisia.blsspainvisa.com/book_appointment.php :

    HttpUrlConnection : 439.2857142857143. Tries 7/7
    HttpClient : 283.42857142857144. Tries 7/7
    Jsoup : 144.71428571428572. Tries 7/7

即使重复测试也会得到相同的结果,我没有在请求之间使用休眠时间来获得快速结果,我相信这对结果没有太大影响。

编辑 事实上,我分析了 Jsoup 的来源,它表明它使用 HttpURLConnection 和 BufferedInputStream,我尝试以 HttpURLConnection 方式使用两者,但结果相同,如您所见,区别很明显,Jsoup 似乎明显比 HttpURLConnection 快它使用 HttpURLConnection !

提前致谢,

您的基准没有意义。

我为这三个库编写了一个微基准测试,结果没有显着差异。

Benchmark                                     Mode  Cnt    Score   Error  Units
HttpBenchmark.httpClientGoogle                avgt    2  151.162          ms/op
HttpBenchmark.httpClientWhosebug         avgt    2  151.086          ms/op
HttpBenchmark.httpUrlConnectionGoogle         avgt    2  235.869          ms/op
HttpBenchmark.httpUrlConnectionWhosebug  avgt    2  145.162          ms/op
HttpBenchmark.jsoupGoogle                     avgt    2  391.162          ms/op
HttpBenchmark.jsoupWhosebug              avgt    2  188.059          ms/op

你的测试和我的测试只有一点点不同:

  • JSoup 设置 header "Accept-Encoding", "gzip" 这将减少带宽
  • JSoup 使用更大的缓冲区 (32kb)
  • 需要重用HttpClient

在我的测试中,JSoup 是最慢的。当然只有 JSoup 解析响应。

我的基准:

@Warmup(iterations = 1, time = 3, timeUnit = TimeUnit.SECONDS)
@Measurement(iterations = 2, time = 5, timeUnit = TimeUnit.SECONDS)
@Fork(1)
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@State(Scope.Benchmark)
@Threads(1)
public class HttpBenchmark {

    private static final String GOOGLE          = "https://google.com/";
    private static final String Whosebug   = "https://whosebug.com";

    private final CloseableHttpClient httpClient = HttpClientBuilder.create().build();

    @Benchmark
    public void httpClientGoogle() throws Exception {
        httpClient(GOOGLE);
    }

    @Benchmark
    public void httpClientWhosebug() throws Exception {
        httpClient(Whosebug);
    }

    @Benchmark
    public void httpUrlConnectionGoogle() throws Exception {
        httpUrlConnection(GOOGLE);
    }

    @Benchmark
    public void httpUrlConnectionWhosebug() throws Exception {
        httpUrlConnection(Whosebug);
    }

    @Benchmark
    public void jsoupGoogle() throws Exception {
        jsoup(GOOGLE);
    }

    @Benchmark
    public void jsoupWhosebug() throws Exception {
        jsoup(Whosebug);
    }

    private void httpClient(final String url) throws Exception {
        final CloseableHttpResponse response = httpClient.execute(new HttpGet(url));
        final BasicResponseHandler basicResponseHandler = new BasicResponseHandler();
        basicResponseHandler.handleResponse(response);
        response.close();
    }

    private void httpUrlConnection(final String url) throws Exception {
        final HttpURLConnection httpURLConnection = (HttpURLConnection) new URL(url).openConnection();
        httpURLConnection.addRequestProperty("Accept-Encoding", "gzip");
        try (final BufferedInputStream r = new BufferedInputStream(httpURLConnection.getInputStream())) {
            final byte[] tmp = new byte[1024 * 32];
            int read;
            while (true) {
                read = r.read(tmp);
                if (read == -1) {
                    break;
                }
            }
        }
    }

    private void jsoup(final String url) throws Exception {
        Jsoup.connect(url).execute().parse();
    }

}