ISTIO sidecar 导致 Java grpc 客户端在高并发负载下抛出 "UNAVAILABLE: upstream connect error or disconnect/reset before headers"
ISTIO sidecar causes Java grpc client throws "UNAVAILABLE: upstream connect error or disconnect/reset before headers" under high concurrency load
我有两个 gRPC 服务,一个将通过普通 gRPC 方法调用另一个(两边都没有流),我使用 istio 作为服务网格,并将 sidecar 注入到两个服务的 kubernetes pod 中。
gRPC 调用在正常负载下正常工作,但在高并发负载情况下,gRPC 客户端不断抛出以下异常:
<#bef7313d> i.g.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers
at io.grpc.Status.asRuntimeException(Status.java:526)
at i.g.s.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at i.g.i.CensusStatsModule$StatsClientInterceptor.onClose(CensusStatsModule.java:678)
at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at i.g.i.CensusTracingModule$TracingClientInterceptor.onClose(CensusTracingModule.java:397)
at i.g.i.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at i.g.i.ClientCallImpl.access0(ClientCallImpl.java:63)
at i.g.i.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at i.g.i.ClientCallImpl$ClientStreamListenerImpl.access0(ClientCallImpl.java:467)
at i.g.i.ClientCallImpl$ClientStreamListenerImplStreamClosed.runInContext(ClientCallImpl.java:584)
at i.g.i.ContextRunnable.run(ContextRunnable.java:37)
at i.g.i.SerializingExecutor.run(SerializingExecutor.java:123)
at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
同时,服务端没有异常,客户端pod的istio-proxy
容器也没有报错。但是,如果我禁用 istio sidecar 注入以便这两个服务直接相互通信,则不会出现此类错误。
谁能告诉我为什么,以及如何解决这个问题?
非常感谢。
终于找到原因了,是envoy sidecar的默认circuitBeakers
设置造成的,默认选项max_pending_requests
和max_requests
设置为1024
,而默认的connecTimeout
是1s
,所以在高并发负载的情况下,当server端有太多pending requests等待服务时,sidecar circuitBreaker会介入,告诉client端server端upstream是 UNAVAILABLE
.
要解决此问题,您需要为具有合理 trafficPolicy
设置的目标服务应用 DestinationRule
。
我有两个 gRPC 服务,一个将通过普通 gRPC 方法调用另一个(两边都没有流),我使用 istio 作为服务网格,并将 sidecar 注入到两个服务的 kubernetes pod 中。
gRPC 调用在正常负载下正常工作,但在高并发负载情况下,gRPC 客户端不断抛出以下异常:
<#bef7313d> i.g.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers
at io.grpc.Status.asRuntimeException(Status.java:526)
at i.g.s.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at i.g.i.CensusStatsModule$StatsClientInterceptor.onClose(CensusStatsModule.java:678)
at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
at i.g.i.CensusTracingModule$TracingClientInterceptor.onClose(CensusTracingModule.java:397)
at i.g.i.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
at i.g.i.ClientCallImpl.access0(ClientCallImpl.java:63)
at i.g.i.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
at i.g.i.ClientCallImpl$ClientStreamListenerImpl.access0(ClientCallImpl.java:467)
at i.g.i.ClientCallImpl$ClientStreamListenerImplStreamClosed.runInContext(ClientCallImpl.java:584)
at i.g.i.ContextRunnable.run(ContextRunnable.java:37)
at i.g.i.SerializingExecutor.run(SerializingExecutor.java:123)
at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
同时,服务端没有异常,客户端pod的istio-proxy
容器也没有报错。但是,如果我禁用 istio sidecar 注入以便这两个服务直接相互通信,则不会出现此类错误。
谁能告诉我为什么,以及如何解决这个问题?
非常感谢。
终于找到原因了,是envoy sidecar的默认circuitBeakers
设置造成的,默认选项max_pending_requests
和max_requests
设置为1024
,而默认的connecTimeout
是1s
,所以在高并发负载的情况下,当server端有太多pending requests等待服务时,sidecar circuitBreaker会介入,告诉client端server端upstream是 UNAVAILABLE
.
要解决此问题,您需要为具有合理 trafficPolicy
设置的目标服务应用 DestinationRule
。