如何防止 Java 中的 SocketInputStream.socketRead0 挂起?

How to prevent hangs on SocketInputStream.socketRead0 in Java?

使用不同的 Java 库执行数百万个 HTTP 请求使我的线程挂起:

java.net.SocketInputStream.socketRead0()

这是native函数。

我尝试设置 Apche Http Client 和 RequestConfig 以在(我希望)所有可能的情况下设置超时,但 我(可能无限)挂在 socketRead0.如何摆脱它们?

挂起率约为每 10000 个请求(对 10000 个不同的主机)约 1 个,它可能会永远持续下去(我已经确认线程挂起在 10 小时后仍然有效)。

JDK 1.8 Windows 7.

我的HttpClient工厂:

SocketConfig socketConfig = SocketConfig.custom()
            .setSoKeepAlive(false)
            .setSoLinger(1)
            .setSoReuseAddress(true)
            .setSoTimeout(5000)
            .setTcpNoDelay(true).build();

    HttpClientBuilder builder = HttpClientBuilder.create();
    builder.disableAutomaticRetries();
    builder.disableContentCompression();
    builder.disableCookieManagement();
    builder.disableRedirectHandling();
    builder.setConnectionReuseStrategy(new NoConnectionReuseStrategy());
    builder.setDefaultSocketConfig(socketConfig);

    return HttpClientBuilder.create().build();

我的RequestConfig工厂:

    HttpGet request = new HttpGet(url);

    RequestConfig config = RequestConfig.custom()
            .setCircularRedirectsAllowed(false)
            .setConnectionRequestTimeout(8000)
            .setConnectTimeout(4000)
            .setMaxRedirects(1)
            .setRedirectsEnabled(true)
            .setSocketTimeout(5000)
            .setStaleConnectionCheckEnabled(true).build();
    request.setConfig(config);

    return new HttpGet(url);

OpenJDK socketRead0 source

注意:实际上我有一些 "trick" - 如果请求正确完成,我可以在其他 Thread 中安排 .getConnectionManager().shutdown() 并取消 Future,但它被贬低了它还会杀死整个 HttpClient,而不仅仅是那个单一的请求。

鉴于到目前为止没有其他人回应,这是我的看法

您的超时设置对我来说非常合适。某些请求在 java.net.SocketInputStream#socketRead0() 调用中似乎不断被阻止的原因可能是由于服务器行为不当和您的本地配置的结合。套接字超时定义了两个连续 i/o 读取操作(或者换句话说,两个连续传入数据包)之间的最大不活动时间。您的套接字超时设置为 5,000 毫秒。只要对端端点继续每 4,999 毫秒发送一个数据包以获取块编码消息,请求就永远不会超时,并且最终将发送大部分时间阻塞在 java.net.SocketInputStream#socketRead0() 中。您可以通过 运行 HttpClient 打开有线日志记录来确定是否属于这种情况。

您应该考虑像 Grizzly or Netty 这样的非阻塞 HTTP 客户端,它没有阻塞操作来挂起线程。

As , you should consider a Non-blocking HTTP client, or (seeing that you are using the Apache Httpclient) implement a Multithreaded request execution 以防止主应用程序线程可能挂起(这不能解决问题但比重新启动您的应用程序要好,因为它已被冻结)。无论如何,您设置了 setStaleConnectionCheckEnabled 属性 但过时连接检查不是 100% 可靠,来自 Apache Httpclient 教程:

One of the major shortcomings of the classic blocking I/O model is that the network socket can react to I/O events only when blocked in an I/O operation. When a connection is released back to the manager, it can be kept alive however it is unable to monitor the status of the socket and react to any I/O events. If the connection gets closed on the server side, the client side connection is unable to detect the change in the connection state (and react appropriately by closing the socket on its end).

HttpClient tries to mitigate the problem by testing whether the connection is 'stale', that is no longer valid because it was closed on the server side, prior to using the connection for executing an HTTP request. The stale connection check is not 100% reliable and adds 10 to 30 ms overhead to each request execution.

Apache HttpComponents 工作人员建议实施 Connection eviction policy

The only feasible solution that does not involve a one thread per socket model for idle connections is a dedicated monitor thread used to evict connections that are considered expired due to a long period of inactivity. The monitor thread can periodically call ClientConnectionManager#closeExpiredConnections() method to close all expired connections and evict closed connections from the pool. It can also optionally call ClientConnectionManager#closeIdleConnections() method to close all connections that have been idle over a given period of time.

看看连接驱逐策略部分的示例代码并尝试在您的应用程序中实现它以及多线程请求执行,我认为这两种机制的实现将防止您不希望的挂起。

虽然这个问题提到了Windows,但我在Linux上也有同样的问题。 JVM 实现阻塞套接字超时的方式似乎存在缺陷:

总而言之,阻塞套接字的超时是通过在 Linux 上调用 poll(以及 Windows 上的 select)来确定在调用 recv。但是,至少在 Linux 上,这两种方法都可以在数据不可用时虚假地指示数据可用,从而导致 recv 无限期阻塞。

来自 poll(2) 手册页 BUGS 部分:

See the discussion of spurious readiness notifications under the BUGS section of select(2).

来自 select(2) 手册页 BUGS 部分:

Under Linux, select() may report a socket file descriptor as "ready for reading", while nevertheless a subsequent read blocks. This could for example happen when data has arrived but upon examination has wrong checksum and is discarded. There may be other circumstances in which a file descriptor is spuriously reported as ready. Thus it may be safer to use O_NONBLOCK on sockets that should not block.

Apache HTTP 客户端代码有点难以遵循,但它 appears that connection expiration is only set for HTTP keep-alive connections (which you've disabled) and is indefinite unless the server specifies otherwise. Therefore, as pointed out by oleg, the Connection eviction policy 方法不适用于您的情况,一般情况下不能依赖。

对于 Apache HTTP 客户端(阻塞),我发现最好的解决方案是 getConnectionManager()。并关闭它。

所以在高可靠性解决方案中,我只是安排在其他线程中关闭,以防请求未完成我从其他线程关闭

我有 50 多台机器,大约可以赚 20 万 requests/day/machine。它们是 运行 Amazon Linux AMI 2017.03。我以前有jdk1.8.0_102,现在我有jdk1.8.0_131。我同时使用 apacheHttpClient 和 OKHttp 作为抓取库。

每台机器 运行 50 个线程,有时线程会丢失。在使用 Youkit java 分析器进行分析后,我得到了

ScraperThread42 State: RUNNABLE CPU usage on sample: 0ms
java.net.SocketInputStream.socketRead0(FileDescriptor, byte[], int, int, int) SocketInputStream.java (native)
java.net.SocketInputStream.socketRead(FileDescriptor, byte[], int, int, int) SocketInputStream.java:116
java.net.SocketInputStream.read(byte[], int, int, int) SocketInputStream.java:171
java.net.SocketInputStream.read(byte[], int, int) SocketInputStream.java:141
okio.Okio.read(Buffer, long) Okio.java:139
okio.AsyncTimeout.read(Buffer, long) AsyncTimeout.java:211
okio.RealBufferedSource.indexOf(byte, long) RealBufferedSource.java:306
okio.RealBufferedSource.indexOf(byte) RealBufferedSource.java:300
okio.RealBufferedSource.readUtf8LineStrict() RealBufferedSource.java:196
okhttp3.internal.http1.Http1Codec.readResponse() Http1Codec.java:191
okhttp3.internal.connection.RealConnection.createTunnel(int, int, Request, HttpUrl) RealConnection.java:303
okhttp3.internal.connection.RealConnection.buildTunneledConnection(int, int, int, ConnectionSpecSelector) RealConnection.java:156
okhttp3.internal.connection.RealConnection.connect(int, int, int, List, boolean) RealConnection.java:112
okhttp3.internal.connection.StreamAllocation.findConnection(int, int, int, boolean) StreamAllocation.java:193
okhttp3.internal.connection.StreamAllocation.findHealthyConnection(int, int, int, boolean, boolean) StreamAllocation.java:129
okhttp3.internal.connection.StreamAllocation.newStream(OkHttpClient, boolean) StreamAllocation.java:98
okhttp3.internal.connection.ConnectInterceptor.intercept(Interceptor$Chain) ConnectInterceptor.java:42
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.internal.http.BridgeInterceptor.intercept(Interceptor$Chain) BridgeInterceptor.java:93
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(Interceptor$Chain) RetryAndFollowUpInterceptor.java:124
okhttp3.internal.http.RealInterceptorChain.proceed(Request, StreamAllocation, HttpCodec, Connection) RealInterceptorChain.java:92
okhttp3.internal.http.RealInterceptorChain.proceed(Request) RealInterceptorChain.java:67
okhttp3.RealCall.getResponseWithInterceptorChain() RealCall.java:198
okhttp3.RealCall.execute() RealCall.java:83

我发现他们有解决这个问题的办法

https://bugs.openjdk.java.net/browse/JDK-8172578

在 JDK 8u152(抢先体验)中。我已经将它安装在我们的一台机器上。现在我等着看一些好结果。

我在使用 apache 通用 http 客户端时遇到了同样的问题。

有一个非常简单的解决方法(不需要关闭连接管理器):

为了重现它,需要在新线程中执行问题中的请求,注意细节:

  • 运行 在单独的线程中请求,关闭请求并在不同的线程中释放它的连接,中断挂起的线程
  • 不要 运行 EntityUtils.consumeQuietly(response.getEntity()) 在 finally 块中(因为它挂在 'dead' 连接上)

首先,添加接口

interface RequestDisposer {
    void dispose();
}

在新线程中执行 HTTP 请求

final AtomicReference<RequestDisposer> requestDisposer = new AtomicReference<>(null);  

final Thread thread = new Thread(() -> {
    final HttpGet request = new HttpGet("http://my.url");
    final RequestDisposer disposer = () -> {
        request.abort();
        request.releaseConnection();
    };
    requestDiposer.set(disposer);

    try (final CloseableHttpResponse response = httpClient.execute(request))) {
        ...
    } finally {
      disposer.dispose();
    } 
};)
thread.start()

在主线程中调用dispose()关闭挂起连接

requestDisposer.get().dispose(); // better check if it's not null first
thread.interrupt();
thread.join();

这解决了我的问题。

我的堆栈跟踪看起来像这样:

java.lang.Thread.State: RUNNABLE
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at org.apache.http.impl.io.ChunkedInputStream.getChunkSize(ChunkedInputStream.java:253)
at org.apache.http.impl.io.ChunkedInputStream.nextChunk(ChunkedInputStream.java:227)
at org.apache.http.impl.io.ChunkedInputStream.read(ChunkedInputStream.java:186)
at org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:137)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)

谁可能感兴趣,它很容易重现,中断线程而不中止请求并释放连接(比率约为1/100)。 Windows10,版本 10.0。 jdk8.151-x64.

我觉得所有这些答案都太具体了。

我们必须注意,这可能是一个真正的 JVM 错误。应该可以获取文件描述符并关闭它。所有这些关于超时的讨论都太高级了。您不希望超时到连接失败的程度,您想要的是能够硬打破这个卡住的线程并停止或中断它。

JVM 应该实现 SocketInputStream.socketRead 函数的方式是设置一些内部默认超时,甚至应该低至 1 秒。然后当超时到来时,立即循环回到socketRead0。在此期间,Thread.interrupt 和 Thread.stop 命令可以生效。

当然,更好的方法是根本不做任何阻塞等待,而是使用带有文件描述符列表的 select(2) 系统调用,当任何一个有数据可用,让它执行读操作。

只要在互联网上看看所有这些遇到线程卡住问题的人 java.net.SocketInputStream#socketRead0,这是关于 java.net.SocketInputStream 的最热门话题!

所以,虽然 bug 没有修复,但我想知道我能想出什么最肮脏的技巧来打破这种情况。类似于连接调试器接口以获取 socketRead 调用的堆栈帧并获取 FileDescriptor,然后进入其中以获取 int fd 编号,然后对该 fd 进行本机 close(2) 调用。

我们有机会这样做吗? (别告诉我"it's not good practice")——既然如此,那就行动吧!

我今天遇到了同样的问题。基于@Sergei Voitovich,我试图让它仍然使用 Apache Http Client 工作。

因为我使用的是 Java 8,所以设置超时以中止连接更简单。

这是实施草案:

private HttpResponse executeRequest(Request request){
    InterruptibleRequestExecution requestExecution = new InterruptibleRequestExecution(request, executor);
    ExecutorService executorService = Executors.newSingleThreadExecutor();
    try {
        return executorService.submit(requestExecution).get(<your timeout in milliseconds>, TimeUnit.MILLISECONDS);
    } catch (TimeoutException | ExecutionException e) {
        // Your request timed out, you can throw an exception here if you want
        throw new UsefulExceptionForYourApplication(e);
    } catch (InterruptedException e) {
        // Always remember to call interrupt after catching InterruptedException
        Thread.currentThread().interrupt();
        throw new UsefulExceptionForYourApplication(e);
    } finally {
        // This method forces to stop the Thread Pool (with single thread) created by Executors.newSingleThreadExecutor() and makes the pending request to abort inside the thread. So if the request is hanging in socketRead0 it will stop and also the thread will be terminated
        forceStopIdleThreadsAndRequests(requestExecution, executorService);
    }
}

private void forceStopIdleThreadsAndRequests(InterruptibleRequestExecution execution,
                                             ExecutorService executorService) {
    execution.abortRequest();
    executorService.shutdownNow();
}

上面的代码将创建一个新的线程来执行使用 org.apache.http.client.fluent.Executor 的请求。可以轻松配置超时。

线程的执行在 InterruptibleRequestExecution 中定义,您可以在下面看到。

private static class InterruptibleRequestExecution implements Callable<HttpResponse> {
    private final Request request;
    private final Executor executor;
    private final RequestDisposer disposer;

    public InterruptibleRequestExecution(Request request, Executor executor) {
        this.request = request;
        this.executor = executor;
        this.disposer = request::abort;
    }

    @Override
    public HttpResponse call() {
        try {
            return executor.execute(request).returnResponse();
        } catch (IOException e) {
            throw new UsefulExceptionForYourApplication(e);
        } finally {
            disposer.dispose();
        }
    }

    public void abortRequest() {
        disposer.dispose();
    }

    @FunctionalInterface
    interface RequestDisposer {
        void dispose();
    }
}

效果真不错。我们曾经有过一些连接在 sockedRead0 中挂起 7 个小时的情况!现在,它永远不会超过定义的超时时间,并且它在每天处理数百万个请求的生产中没有任何问题。