无法下载 java 中的特定 URL
Not able to download specific URL in java
我正在编写以下程序以使用 Apache Common-IO 下载 URL,但出现 ReadTimeOut 异常,
异常
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1456)
at com.touseef.stock.FileDownload.main(FileDownload.java:23)
计划
String urlStr = "https://www.nseindia.com/";
File file = new File("C:\User\WorkSpace\Output.txt");
URL url;
try {
url = new URL(urlStr);
FileUtils.copyURLToFile(url, file);
System.out.println("Successfully Completed.");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
其他站点都可以下载。请建议。
使用 commons-io-2.6 jar。
似乎此站点受到某些 Web 网关的保护(像 Akamai 这样的 DOS 保护服务?)。客户端似乎被 TLS 连接和 HTTP 请求 (headers) 指纹识别,并且只有有效的 Web 浏览器才能连接到该站点。
以下代码使用 Apache commons http client 4.5 并且至少目前有效:
String urlStr = "https://www.nseindia.com/";
File file = new File("C:\User\WorkSpace\Output.txt");
String userAgent = "-";
CloseableHttpClient httpclient = HttpClients.custom().setUserAgent(userAgent).build();
HttpGet httpget = new HttpGet(urlStr);
httpget.addHeader("Accept-Language", "en-US");
httpget.addHeader("Cookie", "");
System.out.println("Executing request " + httpget.getRequestLine());
try (CloseableHttpResponse response = httpclient.execute(httpget)) {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
String body = EntityUtils.toString(response.getEntity());
System.out.println(body);
Files.writeString(file.toPath(), body);
}
例如在 Firefox 中工作的请求在 Java 中不工作(因为与协议和密码的 TLS 连接不同)。我使用 Apache commons http 客户端尝试了一些组合。但也失败了(即使相同的请求来自 Fiddler)。
因此从 Java 中使用这个网站是非常困难的,即使上面的代码现在可以工作,保护系统可以随时调整,这样它就不会再工作了。
我假设这样的站点提供了一个 API 专供程序使用的站点。联系他们并询问,这是我能给你的唯一建议。
我正在编写以下程序以使用 Apache Common-IO 下载 URL,但出现 ReadTimeOut 异常, 异常
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at sun.security.ssl.InputRecord.readFully(Unknown Source)
at sun.security.ssl.InputRecord.read(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readRecord(Unknown Source)
at sun.security.ssl.SSLSocketImpl.readDataRecord(Unknown Source)
at sun.security.ssl.AppInputStream.read(Unknown Source)
at java.io.BufferedInputStream.fill(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTPHeader(Unknown Source)
at sun.net.www.http.HttpClient.parseHTTP(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(Unknown Source)
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(Unknown Source)
at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(Unknown Source)
at java.net.URL.openStream(Unknown Source)
at org.apache.commons.io.FileUtils.copyURLToFile(FileUtils.java:1456)
at com.touseef.stock.FileDownload.main(FileDownload.java:23)
计划
String urlStr = "https://www.nseindia.com/";
File file = new File("C:\User\WorkSpace\Output.txt");
URL url;
try {
url = new URL(urlStr);
FileUtils.copyURLToFile(url, file);
System.out.println("Successfully Completed.");
} catch (MalformedURLException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
其他站点都可以下载。请建议。 使用 commons-io-2.6 jar。
似乎此站点受到某些 Web 网关的保护(像 Akamai 这样的 DOS 保护服务?)。客户端似乎被 TLS 连接和 HTTP 请求 (headers) 指纹识别,并且只有有效的 Web 浏览器才能连接到该站点。
以下代码使用 Apache commons http client 4.5 并且至少目前有效:
String urlStr = "https://www.nseindia.com/";
File file = new File("C:\User\WorkSpace\Output.txt");
String userAgent = "-";
CloseableHttpClient httpclient = HttpClients.custom().setUserAgent(userAgent).build();
HttpGet httpget = new HttpGet(urlStr);
httpget.addHeader("Accept-Language", "en-US");
httpget.addHeader("Cookie", "");
System.out.println("Executing request " + httpget.getRequestLine());
try (CloseableHttpResponse response = httpclient.execute(httpget)) {
System.out.println("----------------------------------------");
System.out.println(response.getStatusLine());
String body = EntityUtils.toString(response.getEntity());
System.out.println(body);
Files.writeString(file.toPath(), body);
}
例如在 Firefox 中工作的请求在 Java 中不工作(因为与协议和密码的 TLS 连接不同)。我使用 Apache commons http 客户端尝试了一些组合。但也失败了(即使相同的请求来自 Fiddler)。
因此从 Java 中使用这个网站是非常困难的,即使上面的代码现在可以工作,保护系统可以随时调整,这样它就不会再工作了。
我假设这样的站点提供了一个 API 专供程序使用的站点。联系他们并询问,这是我能给你的唯一建议。