Spring Boot 2.0.1 升级后疯狂的 HTTP 问题

Crazy HTTP problems after Spring Boot 2.0.1 upgrade

升级到 Spring Boot 2.0.1 后,我的应用程序遇到莫名其妙的问题。

涉及两个应用程序。对 App 1 的请求对 App 2 进行后端调用以获取一些数据。对 App 2 的调用通过 AWS 弹性负载均衡器 (ELB)。用于进行调用的客户端是由 my utility 包装的 Apache Commons HttpClient。

应用程序 1 升级到 Boot 2.0.1 后,我发现从应用程序 1 到应用程序 2 的一小部分调用挂起很长时间(15 分钟)。当我使用 JConsole 获取挂起线程的线程转储时,我看到了这个堆栈跟踪:

Stack trace: 
java.net.PlainSocketImpl.socketClose0(Native Method)
java.net.AbstractPlainSocketImpl.socketPreClose(AbstractPlainSocketImpl.java:693)
java.net.AbstractPlainSocketImpl.close(AbstractPlainSocketImpl.java:530)
   - locked java.lang.Object@29a91ad3
java.net.SocksSocketImpl.close(SocksSocketImpl.java:1075)
java.net.Socket.close(Socket.java:1495)
   - locked java.lang.Object@71e238ff
   - locked java.net.Socket@73f58606
sun.security.ssl.BaseSSLSocketImpl.close(BaseSSLSocketImpl.java:624)
   - locked sun.security.ssl.SSLSocketImpl@22264f18
sun.security.ssl.SSLSocketImpl.closeSocket(SSLSocketImpl.java:1585)
sun.security.ssl.SSLSocketImpl.closeInternal(SSLSocketImpl.java:1723)
sun.security.ssl.SSLSocketImpl.recvAlert(SSLSocketImpl.java:2020)
sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:1135)
   - locked sun.security.ssl.SSLSocketImpl@22264f18
   - locked java.lang.Object@4338a60d
sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:940)
sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
   - locked sun.security.ssl.AppInputStream@23fd5b55
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:153)
org.apache.http.impl.BHttpConnectionBase.fillInputBuffer(BHttpConnectionBase.java:344)
org.apache.http.impl.BHttpConnectionBase.isStale(BHttpConnectionBase.java:364)
org.apache.http.impl.conn.CPool.validate(CPool.java:71)
org.apache.http.impl.conn.CPool.validate(CPool.java:45)
org.apache.http.pool.AbstractConnPool.get(AbstractConnPool.java:249)
   - locked org.apache.http.pool.AbstractConnPool@7c672c9a
org.apache.http.pool.AbstractConnPool.get(AbstractConnPool.java:193)
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.leaseConnection(PoolingHttpClientConnectionManager.java:282)
org.apache.http.impl.conn.PoolingHttpClientConnectionManager.get(PoolingHttpClientConnectionManager.java:269)
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:191)
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:72)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:221)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:165)
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:140)
com.kendelong.util.http.HttpConnectionService.doExecuteAndGetResponse(HttpConnectionService.java:243)
com.kendelong.util.http.HttpConnectionService.getResult(HttpConnectionService.java:189)
com.kendelong.util.http.IHttpConnectionService$getResult[=11=].call(Unknown Source)
com.hatchbaby.sub.util.MainSiteHttpProxyService.getMemberData(MainSiteHttpProxyService.groovy:68)
com.hatchbaby.sub.util.MainSiteHttpProxyService$$FastClassBySpringCGLIB$cc5cf68.invoke(<generated>)
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:747)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
org.springframework.aop.aspectj.MethodInvocationProceedingJoinPoint.proceed(MethodInvocationProceedingJoinPoint.java:89)
com.kendelong.util.performance.PerformanceMonitoringAspect.monitorInvocation(PerformanceMonitoringAspect.java:108)
com.kendelong.util.performance.PerformanceMonitoringAspect.monitorAnnotatedClasses(PerformanceMonitoringAspect.java:83)
sun.reflect.GeneratedMethodAccessor177.invoke(Unknown Source)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:498)
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethodWithGivenArgs(AbstractAspectJAdvice.java:644)
org.springframework.aop.aspectj.AbstractAspectJAdvice.invokeAdviceMethod(AbstractAspectJAdvice.java:633)
org.springframework.aop.aspectj.AspectJAroundAdvice.invoke(AspectJAroundAdvice.java:70)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:174)
org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:92)
org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:185)
org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:689)
com.hatchbaby.sub.util.MainSiteHttpProxyService$$EnhancerBySpringCGLIB$$edc027d4.getMemberData(<generated>)
com.hatchbaby.sub.service.security.HatchBabyAuthenticationUserDetailsService.loadUserDetails(HatchBabyAuthenticationUserDetailsService.java:39)
com.hatchbaby.sub.service.security.HatchBabyAuthenticationUserDetailsService.loadUserDetails(HatchBabyAuthenticationUserDetailsService.java:20)
org.springframework.security.web.authentication.preauth.PreAuthenticatedAuthenticationProvider.authenticate(PreAuthenticatedAuthenticationProvider.java:103)
org.springframework.security.authentication.ProviderManager.authenticate(ProviderManager.java:174)
org.springframework.security.web.authentication.preauth.AbstractPreAuthenticatedProcessingFilter.doAuthenticate(AbstractPreAuthenticatedProcessingFilter.java:184)
org.springframework.security.web.authentication.preauth.AbstractPreAuthenticatedProcessingFilter.doFilter(AbstractPreAuthenticatedProcessingFilter.java:118)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:105)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.context.request.async.WebAsyncManagerIntegrationFilter.doFilterInternal(WebAsyncManagerIntegrationFilter.java:56)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:334)
org.springframework.security.web.FilterChainProxy.doFilterInternal(FilterChainProxy.java:215)
org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:178)
org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:357)
org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:270)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.HttpPutFormContentFilter.doFilterInternal(HttpPutFormContentFilter.java:109)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:81)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.AbstractRequestLoggingFilter.doFilterInternal(AbstractRequestLoggingFilter.java:245)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
com.hatchbaby.sub.web.RequestInfoFilter.doFilter(RequestInfoFilter.java:99)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200)
org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
com.hatchbaby.sub.web.SpecialRedirectFilter.doFilter(SpecialRedirectFilter.java:43)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.ebaysf.web.cors.CORSFilter.handleSimpleCORS(CORSFilter.java:302)
org.ebaysf.web.cors.CORSFilter.doFilter(CORSFilter.java:160)
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193)
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166)
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:198)
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96)
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:496)
org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:650)
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:140)
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81)
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87)
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:342)
org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:803)
org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:66)
org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:790)
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1459)
org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
   - locked org.apache.tomcat.util.net.NioEndpoint$NioSocketWrapper@6ca9f281
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
java.lang.Thread.run(Thread.java:748)

线程在此状态下挂起大约 15 分钟(实际上是 15 分 44 秒 plus/minus 5 秒),或其大致整数倍。最终请求成功。

更奇怪的是,当我比较日志时,我看到请求进入 App 1 并调用 App 2(例如,在 8:00)。如线程转储中所示,线程在 socketClose0() 方法中挂起。当请求在 15 分钟后(8:15)最终完成时,App 2 及其 ELB 日志中的时间戳显示请求到达的时间较晚(8:15):就好像套接字是尝试关闭 15 分钟,它最终关闭(没有联系 App 2 服务器),并立即重试。通常它会成功,但有时它会再次卡住 15 分钟(因此是 15 分钟的倍数)。但是请求直到 15 分钟后才在 App 2 的 ELB 上注册 (8:15)。

我在我的 QA 环境(每台服务器)和我的生产环境(每台 2 台服务器)中都看到过这种行为。如果我将代码回滚到 Spring Boot 1.5.9 版本,问题就会消失。但是,我已经将大部分可疑的库一一还原到Boot 1.5.9版本,问题依旧。我已经恢复了 httpclient(显然)、groovy、tomcat 和 commons-codec。我无法恢复 Spring(重置 spring.version 属性),因为应用程序无法启动。作为升级到Boot 2.0.1的一部分,我从JGroups分发的EhCache切换到Hizelcast作为Hibernate二级缓存,但我已经从应用程序中完全删除了Hazelcast,问题仍然存在。由于代码更改,恢复 Hibernate 或 Spring Data JPA 更加困难。

当然,我的应用程序没有将超时设置为 15 分钟。 httpclient 的连接和套接字超时均为 5 秒。

有没有人有任何线索或任何故障排除建议?

更新

设置 -Djavax.net.debug=all 产生此输出:

http-nio-8080-exec-6, setSoTimeout(1) called
[Raw read]: length = 5
0000: 15 03 03 00 1A                                     .....
[Raw read]: length = 26
0000: 98 1C 7C C4 27 0B 2B 40   E5 AC D7 38 A9 6F F7 73  ....'.+@...8.o.s
0010: 83 18 E8 15 14 7C 52 10   47 5E                    ......R.G^
http-nio-8080-exec-6, READ: TLSv1.2 Alert, length = 26
Padded plaintext after DECRYPTION:  len = 2
0000: 01 00                                              ..
http-nio-8080-exec-6, RECV TLSv1.2 ALERT:  warning, close_notify
http-nio-8080-exec-6, called closeInternal(false)
http-nio-8080-exec-6, SEND TLSv1.2 ALERT:  warning, description = close_notify
Padded plaintext before ENCRYPTION:  len = 2
0000: 01 00                                              ..
http-nio-8080-exec-6, WRITE: TLSv1.2 Alert, length = 26
[Raw write]: length = 31
0000: 15 03 03 00 1A 00 00 00   00 00 00 00 02 1F A3 99  ................
0010: 4C CD 61 EB 02 2C 14 0D   00 27 03 51 05 F7 1F     L.a..,...'.Q...
http-nio-8080-exec-6, called closeSocket(false)

此时一切都挂起。在客户端上查看 netstat,我看到套接字处于 LAST_ACK 状态,这意味着它正在等待最终的 ACK 数据包。

我从组合中删除了应用 2 的 ELB,并直接连接到应用 2 的服务器。问题仍然存在。当客户端连接处于 LAST_ACK 状态时,服务器端的 netstat 中没有显示相应的连接。如果服务器没有发送最终的 ACK,它会在 FIN_WAIT_2 中,但由于连接已经断开,它一定已经发送了最终的 ACK,但它从未到达客户端。

Spring 引导升级如何导致套接字关闭操作上的 ACK 数据包受阻?

这一切都发生在亚马逊 Linux 2017.9.

更新 2

五个月后,同样的事情。我将 App2 升级到 Spring Boot 2.0.5,没问题。

App1,当升级到 2.0.5 时,开始挂起它的线程。我打开了 HttpClient 线路记录,我看到它即将进行网络调用的地方,线路记录器上没有传出数据,15 分钟 33 秒后我收到一个 "end of stream",请求最终关闭。

哈利路亚,我想通了。我的 HTTP library 正在设置 socketLinger = socketTimeout + 500。然而,socketLinger 和 socketTimeout 是截然不同的!我的 socketTimeout 设置为 5000 毫秒。我发布了 1.7.0 版,它停止设置 socketLinger 和 presto 问题消失了。

我仍然不知道为什么我以前从未见过这个问题,或者为什么 Spring Boot 2 升级导致错误出现,但至少它现在消失了!