熔断动作后无法阻止请求到达端点

Unable to prevent requests reaching the endpoint after circuit breaking is in action

我正在尝试通过在 linkerd 作为守护进程部署的同一 k8s 集群中部署为 pod 的简单易错端点请求来验证 linkerd 的断路配置。

我通过观察日志注意到发生了断路,但是当我再次尝试访问端点时,我仍然收到来自端点的响应。

设置和测试

我使用以下配置来设置 linkerd 及其端点,

https://raw.githubusercontent.com/linkerd/linkerd-examples/master/k8s-daemonset/k8s/linkerd-egress.yaml

https://raw.githubusercontent.com/zillani/kubex/master/examples/simple-err.yml

端点行为:

端点始终return 500 内部服务器错误

失败累计设置:默认 responseClassifier: retryable5XX

代理卷曲:

http_proxy=$(kubectl get svc l5d -o jsonpath="{.status.loadBalancer.ingress[0].*}"):4140 curl -L http://<loadblancer-ingress>:8080/simple-err

观察结果

1.在管理指标

  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/connects" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/dtab/size.count" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failed_connect_latency_ms.count" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/probes" : 8,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removals" : 2,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/removed_for_ms" : 268542,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failure_accrual/revivals" : 0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 505,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/adds" : 2,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/algorithm/p2c_least_loaded" : 1.0,
  "rt/outgoing/client/$/io.buoyant.rinet/8080/<loadbalancer-ingress>/loadbalancer/available" : 2.0,

 "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/failures/com.twitter.finagle.service.ResponseClassificationSyntheticException" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/pending" : 0.0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/request_latency_ms.count" : 0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/requests" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget" : 100.0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/budget_exhausted" : 5,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/per_request.count" : 0,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/retries/total" : 500,
  "rt/outgoing/service/svc/<loadbalancer-ingress>:8080/success" : 0,

2。在日志

I 0518 10:31:15.816 UTC THREAD23 TraceId:e57aa1baa5148cc5: FailureAccrualFactorymarking connection to "$/io.buoyant.rinet/8080/<loadbalancer-ingress>" as dead.

问题

节点被标记为死后,对 linkerd 的新请求(上面的 http_proxy 命令)正在到达端点并 return 响应。

此问题已在 Linkerd community forum 上得到回答。为了完整起见,在这里也添加答案:

When failure accrual (circuit breaker) triggers, the endpoint is put into a state called Busy. This actually doesn't guarantee that the endpoint won't be used. Most load balancers (including the default P2CLeastLoaded) will simply pick the healthiest endpoint. In the case where failure accrual has triggered on all endpoints, this means it will have to pick one in the Busy state.