无法下载 java 中的特定 URL

Not able to download specific URL in java

我正在编写以下程序以使用 Apache Common-IO 下载 URL,但出现 ReadTimeOut 异常, 异常

java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1456)
at com.touseef.stock.FileDownload.main(FileDownload.java:23)

计划

String urlStr = "https://www.nseindia.com/";
    File file = new File("C:\User\WorkSpace\Output.txt");
    URL url;
    try {
        url = new URL(urlStr);
        FileUtils.copyURLToFile(url, file);
        System.out.println("Successfully Completed.");
    } catch (MalformedURLException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }

其他站点都可以下载。请建议。 使用 commons-io-2.6 jar。

似乎此站点受到某些 Web 网关的保护(像 Akamai 这样的 DOS 保护服务?)。客户端似乎被 TLS 连接和 HTTP 请求 (headers) 指纹识别,并且只有有效的 Web 浏览器才能连接到该站点。

以下代码使用 Apache commons http client 4.5 并且至少目前有效:

    String urlStr = "https://www.nseindia.com/";
    File file = new File("C:\User\WorkSpace\Output.txt");
    String userAgent = "-";

    CloseableHttpClient httpclient = HttpClients.custom().setUserAgent(userAgent).build();
    HttpGet httpget = new HttpGet(urlStr);
    httpget.addHeader("Accept-Language", "en-US");
    httpget.addHeader("Cookie", "");

    System.out.println("Executing request " + httpget.getRequestLine());
    try (CloseableHttpResponse response = httpclient.execute(httpget)) {
        System.out.println("----------------------------------------");
        System.out.println(response.getStatusLine());
        String body = EntityUtils.toString(response.getEntity());
        System.out.println(body);
        Files.writeString(file.toPath(), body);
    }

例如在 Firefox 中工作的请求在 Java 中不工作(因为与协议和密码的 TLS 连接不同)。我使用 Apache commons http 客户端尝试了一些组合。但也失败了(即使相同的请求来自 Fiddler)。

因此从 Java 中使用这个网站是非常困难的,即使上面的代码现在可以工作,保护系统可以随时调整,这样它就不会再工作了。

我假设这样的站点提供了一个 API 专供程序使用的站点。联系他们并询问,这是我能给你的唯一建议。