ISTIO sidecar 导致 Java grpc 客户端在高并发负载下抛出 "UNAVAILABLE: upstream connect error or disconnect/reset before headers"

ISTIO sidecar causes Java grpc client throws "UNAVAILABLE: upstream connect error or disconnect/reset before headers" under high concurrency load

我有两个 gRPC 服务,一个将通过普通 gRPC 方法调用另一个(两边都没有流),我使用 istio 作为服务网格,并将 sidecar 注入到两个服务的 kubernetes pod 中。

gRPC 调用在正常负载下正常工作,但在高并发负载情况下,gRPC 客户端不断抛出以下异常:

<#bef7313d> i.g.StatusRuntimeException: UNAVAILABLE: upstream connect error or disconnect/reset before headers
    at io.grpc.Status.asRuntimeException(Status.java:526)
    at i.g.s.ClientCalls$StreamObserverToCallListenerAdapter.onClose(ClientCalls.java:434)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusStatsModule$StatsClientInterceptor.onClose(CensusStatsModule.java:678)
    at i.g.PartialForwardingClientCallListener.onClose(PartialForwardingClientCallListener.java:39)
    at i.g.ForwardingClientCallListener.onClose(ForwardingClientCallListener.java:23)
    at i.g.ForwardingClientCallListener$SimpleForwardingClientCallListener.onClose(ForwardingClientCallListener.java:40)
    at i.g.i.CensusTracingModule$TracingClientInterceptor.onClose(CensusTracingModule.java:397)
    at i.g.i.ClientCallImpl.closeObserver(ClientCallImpl.java:459)
    at i.g.i.ClientCallImpl.access0(ClientCallImpl.java:63)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.close(ClientCallImpl.java:546)
    at i.g.i.ClientCallImpl$ClientStreamListenerImpl.access0(ClientCallImpl.java:467)
    at i.g.i.ClientCallImpl$ClientStreamListenerImplStreamClosed.runInContext(ClientCallImpl.java:584)
    at i.g.i.ContextRunnable.run(ContextRunnable.java:37)
    at i.g.i.SerializingExecutor.run(SerializingExecutor.java:123)
    at j.u.c.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at j.u.c.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

同时,服务端没有异常,客户端pod的istio-proxy容器也没有报错。但是,如果我禁用 istio sidecar 注入以便这两个服务直接相互通信,则不会出现此类错误。

谁能告诉我为什么,以及如何解决这个问题?

非常感谢。

终于找到原因了,是envoy sidecar的默认circuitBeakers设置造成的,默认选项max_pending_requestsmax_requests设置为1024,而默认的connecTimeout1s,所以在高并发负载的情况下,当server端有太多pending requests等待服务时,sidecar circuitBreaker会介入,告诉client端server端upstream是 UNAVAILABLE.

要解决此问题,您需要为具有合理 trafficPolicy 设置的目标服务应用 DestinationRule