卡在 SocketInputStream.socketRead0
Getting stuck at SocketInputStream.socketRead0
我有一个项目,我在许多任务中同时下载许多页面,这些任务是通过 ThreadPool
(大小 = 200)处理的。所有这些任务都使用相同的方法 getPage
下载页面(使用 Apache Commons HttpClient 和 Apache Commons IO):
public static String getPage(String url)
throws IOException {
HttpUriRequest request = new HttpGet(url);
HttpResponse response = HTTP_CLIENT_BUILDER.build().execute(request);
try (InputStream content = response.getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
}
}
而 HTTP_CLIENT_BUILDER
是这样初始化的静态字段:
private static final HttpClientBuilder HTTP_CLIENT_BUILDER = HttpClients.custom()
.setDefaultRequestConfig(RequestConfig.custom()
.setSocketTimeout(SOCKET_TIMEOUT_MS) // 60_000
.setConnectTimeout(CONNECTION_TIMEOUT_MS) // 5_000
.build());
问题陈述:在某个时刻(当大部分任务完成时)所有剩余线程都卡在本地方法SocketInputStream.socketRead0
,所以jdb
是说,它们都是 运行(嗯,是的,我希望本地方法的行为 运行 :-)):
> threads
Group system:
(java.lang.ref.Reference$ReferenceHandler)0xac4 Reference Handler cond. waiting
(java.lang.ref.Finalizer$FinalizerThread)0xac5 Finalizer cond. waiting
(java.lang.Thread)0xac6 Signal Dispatcher running
(java.lang.Thread)0xac7 Java2D Disposer cond. waiting
Group main:
(java.lang.Thread)0xac9 pool-1-thread-5 running
(java.lang.Thread)0xaca pool-1-thread-12 running
(... 12 more threads from ThreadPool ...)
(java.lang.Thread)0xad7 DestroyJavaVM running
> where 0xac9
[1] java.net.SocketInputStream.socketRead0 (native method)
[2] java.net.SocketInputStream.read (SocketInputStream.java:150)
[3] java.net.SocketInputStream.read (SocketInputStream.java:121)
[4] sun.security.ssl.InputRecord.readFully (InputRecord.java:465)
[5] sun.security.ssl.InputRecord.read (InputRecord.java:503)
[6] sun.security.ssl.SSLSocketImpl.readRecord (SSLSocketImpl.java:961)
[7] sun.security.ssl.SSLSocketImpl.performInitialHandshake (SSLSocketImpl.java:1,363)
[8] sun.security.ssl.SSLSocketImpl.startHandshake (SSLSocketImpl.java:1,391)
[9] sun.security.ssl.SSLSocketImpl.startHandshake (SSLSocketImpl.java:1,375)
[10] org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket (SSLConnectionSocketFactory.java:275)
[11] org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket (SSLConnectionSocketFactory.java:254)
[12] org.apache.http.impl.conn.HttpClientConnectionOperator.connect (HttpClientConnectionOperator.java:117)
[13] org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect (PoolingHttpClientConnectionManager.java:314)
[14] org.apache.http.impl.execchain.MainClientExec.establishRoute (MainClientExec.java:363)
[15] org.apache.http.impl.execchain.MainClientExec.execute (MainClientExec.java:219)
[16] org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:195)
[17] org.apache.http.impl.execchain.RetryExec.execute (RetryExec.java:86)
[18] org.apache.http.impl.execchain.RedirectExec.execute (RedirectExec.java:108)
[19] org.apache.http.impl.client.InternalHttpClient.doExecute (InternalHttpClient.java:186)
[20] org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:82)
[21] org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:106)
[22] <package>.Utils.getPage (Utils.java:122)
[23...] <internal details>
> # the same picture for all of them
我不明白,为什么会发生这种情况,但我发现 Java bug,这可能与问题有关。所以也许我不是在寻找真正的解决方案,而是在寻找一些解决方法。
由于错误是针对 Linux 提出的,我应该说,我也在使用虚拟机 运行 Ubuntu 14.04 x86_64
UPD:好的,我现在尝试的是使用 setConnectionRequestTimeout
添加新超时(只是为了确保它不起作用)添加 finally
块 getPage
:
...
try (InputStream content = response.getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
} finally {
httpClient.getConnectionManager().closeIdleConnections(0, TimeUnit.NANOSECONDS);
}
让我们看看,是否有帮助。
UPD2:这似乎有点帮助,但我仍然有这个永久 运行 任务大约每天卡住一次。
不幸的是,我没有找到任何简单的解决方法(或真正的解决方案),所以我让经理编写了我自己的解决方法,我希望它能帮助遇到该错误的人:
创建 class ConnectionSupervisor
:
private static class ConnectionsSupervisor extends Thread {
private Set<RequestEntry> streams = new CopyOnWriteArraySet<>();
public ConnectionsSupervisor() {
setDaemon(true);
setName("Connections supervisor");
}
@Override
public void run() {
while (true) {
try {
Thread.sleep(CONNECTIONS_SUPERVISOR_WAIT_MS);
} catch (InterruptedException ignored) {
}
long time = timestamp();
streams.stream().filter(entry -> time > entry.timeoutBorder).forEach(entry -> {
HttpUriRequest request = entry.request;
System.err.format("HttpUriRequest killed after timeout (%d sec.) exceeded: %s%n",
FULL_CONNECTION_TIMEOUT_S,
request);
request.abort();
});
}
}
public void addRequest(HttpUriRequest request) {
streams.add(new RequestEntry(timestamp() + FULL_CONNECTION_TIMEOUT_S, request));
}
public void removeRequest(HttpUriRequest request) {
streams.removeIf(entry -> entry.request == request);
}
private static class RequestEntry {
private long timeoutBorder;
private HttpUriRequest request;
public RequestEntry(long timeoutBorder, HttpUriRequest request) {
this.timeoutBorder = timeoutBorder;
this.request = request;
}
}
}
public static long timestamp() {
return Instant.now().getEpochSecond();
}
某处应该有一个 ConnectionSupervisor
的实例,类似于:
private static final ConnectionsSupervisor connectionsSupervisor = new ConnectionsSupervisor();
static {
connectionsSupervisor.start();
}
在类似 getPage
的方法中:
HttpUriRequest request = ...;
// ...
connectionsSupervisor.addRequest(request);
try (InputStream content = httpClient.execute(request).getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
// or any other usage
} finally {
connectionsSupervisor.removeRequest(request);
// highly important!
}
我有一个项目,我在许多任务中同时下载许多页面,这些任务是通过 ThreadPool
(大小 = 200)处理的。所有这些任务都使用相同的方法 getPage
下载页面(使用 Apache Commons HttpClient 和 Apache Commons IO):
public static String getPage(String url)
throws IOException {
HttpUriRequest request = new HttpGet(url);
HttpResponse response = HTTP_CLIENT_BUILDER.build().execute(request);
try (InputStream content = response.getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
}
}
而 HTTP_CLIENT_BUILDER
是这样初始化的静态字段:
private static final HttpClientBuilder HTTP_CLIENT_BUILDER = HttpClients.custom()
.setDefaultRequestConfig(RequestConfig.custom()
.setSocketTimeout(SOCKET_TIMEOUT_MS) // 60_000
.setConnectTimeout(CONNECTION_TIMEOUT_MS) // 5_000
.build());
问题陈述:在某个时刻(当大部分任务完成时)所有剩余线程都卡在本地方法SocketInputStream.socketRead0
,所以jdb
是说,它们都是 运行(嗯,是的,我希望本地方法的行为 运行 :-)):
> threads
Group system:
(java.lang.ref.Reference$ReferenceHandler)0xac4 Reference Handler cond. waiting
(java.lang.ref.Finalizer$FinalizerThread)0xac5 Finalizer cond. waiting
(java.lang.Thread)0xac6 Signal Dispatcher running
(java.lang.Thread)0xac7 Java2D Disposer cond. waiting
Group main:
(java.lang.Thread)0xac9 pool-1-thread-5 running
(java.lang.Thread)0xaca pool-1-thread-12 running
(... 12 more threads from ThreadPool ...)
(java.lang.Thread)0xad7 DestroyJavaVM running
> where 0xac9
[1] java.net.SocketInputStream.socketRead0 (native method)
[2] java.net.SocketInputStream.read (SocketInputStream.java:150)
[3] java.net.SocketInputStream.read (SocketInputStream.java:121)
[4] sun.security.ssl.InputRecord.readFully (InputRecord.java:465)
[5] sun.security.ssl.InputRecord.read (InputRecord.java:503)
[6] sun.security.ssl.SSLSocketImpl.readRecord (SSLSocketImpl.java:961)
[7] sun.security.ssl.SSLSocketImpl.performInitialHandshake (SSLSocketImpl.java:1,363)
[8] sun.security.ssl.SSLSocketImpl.startHandshake (SSLSocketImpl.java:1,391)
[9] sun.security.ssl.SSLSocketImpl.startHandshake (SSLSocketImpl.java:1,375)
[10] org.apache.http.conn.ssl.SSLConnectionSocketFactory.createLayeredSocket (SSLConnectionSocketFactory.java:275)
[11] org.apache.http.conn.ssl.SSLConnectionSocketFactory.connectSocket (SSLConnectionSocketFactory.java:254)
[12] org.apache.http.impl.conn.HttpClientConnectionOperator.connect (HttpClientConnectionOperator.java:117)
[13] org.apache.http.impl.conn.PoolingHttpClientConnectionManager.connect (PoolingHttpClientConnectionManager.java:314)
[14] org.apache.http.impl.execchain.MainClientExec.establishRoute (MainClientExec.java:363)
[15] org.apache.http.impl.execchain.MainClientExec.execute (MainClientExec.java:219)
[16] org.apache.http.impl.execchain.ProtocolExec.execute (ProtocolExec.java:195)
[17] org.apache.http.impl.execchain.RetryExec.execute (RetryExec.java:86)
[18] org.apache.http.impl.execchain.RedirectExec.execute (RedirectExec.java:108)
[19] org.apache.http.impl.client.InternalHttpClient.doExecute (InternalHttpClient.java:186)
[20] org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:82)
[21] org.apache.http.impl.client.CloseableHttpClient.execute (CloseableHttpClient.java:106)
[22] <package>.Utils.getPage (Utils.java:122)
[23...] <internal details>
> # the same picture for all of them
我不明白,为什么会发生这种情况,但我发现 Java bug,这可能与问题有关。所以也许我不是在寻找真正的解决方案,而是在寻找一些解决方法。
由于错误是针对 Linux 提出的,我应该说,我也在使用虚拟机 运行 Ubuntu 14.04 x86_64
UPD:好的,我现在尝试的是使用 setConnectionRequestTimeout
添加新超时(只是为了确保它不起作用)添加 finally
块 getPage
:
...
try (InputStream content = response.getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
} finally {
httpClient.getConnectionManager().closeIdleConnections(0, TimeUnit.NANOSECONDS);
}
让我们看看,是否有帮助。
UPD2:这似乎有点帮助,但我仍然有这个永久 运行 任务大约每天卡住一次。
不幸的是,我没有找到任何简单的解决方法(或真正的解决方案),所以我让经理编写了我自己的解决方法,我希望它能帮助遇到该错误的人:
创建 class ConnectionSupervisor
:
private static class ConnectionsSupervisor extends Thread {
private Set<RequestEntry> streams = new CopyOnWriteArraySet<>();
public ConnectionsSupervisor() {
setDaemon(true);
setName("Connections supervisor");
}
@Override
public void run() {
while (true) {
try {
Thread.sleep(CONNECTIONS_SUPERVISOR_WAIT_MS);
} catch (InterruptedException ignored) {
}
long time = timestamp();
streams.stream().filter(entry -> time > entry.timeoutBorder).forEach(entry -> {
HttpUriRequest request = entry.request;
System.err.format("HttpUriRequest killed after timeout (%d sec.) exceeded: %s%n",
FULL_CONNECTION_TIMEOUT_S,
request);
request.abort();
});
}
}
public void addRequest(HttpUriRequest request) {
streams.add(new RequestEntry(timestamp() + FULL_CONNECTION_TIMEOUT_S, request));
}
public void removeRequest(HttpUriRequest request) {
streams.removeIf(entry -> entry.request == request);
}
private static class RequestEntry {
private long timeoutBorder;
private HttpUriRequest request;
public RequestEntry(long timeoutBorder, HttpUriRequest request) {
this.timeoutBorder = timeoutBorder;
this.request = request;
}
}
}
public static long timestamp() {
return Instant.now().getEpochSecond();
}
某处应该有一个 ConnectionSupervisor
的实例,类似于:
private static final ConnectionsSupervisor connectionsSupervisor = new ConnectionsSupervisor();
static {
connectionsSupervisor.start();
}
在类似 getPage
的方法中:
HttpUriRequest request = ...;
// ...
connectionsSupervisor.addRequest(request);
try (InputStream content = httpClient.execute(request).getEntity().getContent()) {
return IOUtils.toString(content, "UTF-8");
// or any other usage
} finally {
connectionsSupervisor.removeRequest(request);
// highly important!
}