特使熔断非确定性行为

Question

我们对 envoy 的 circuit breaking 进行的实验表明结果不是确定性的。我们尝试使用如下设置故意使电路跳闸证明了这一点：

该服务是一个简单的 Web 服务器，它 return 是一个 200 具有 2 秒时间延迟（时间延迟确保服务器在异步请求之间保持忙碌）。我们的 envoy sidecar 配置快照显示我们启用了熔断（超过 http/1.1），最多有 1 个连接和 1 个待处理请求：

circuit_breakers:
   thresholds:
     - priority: "DEFAULT"
       max_connections: 1
       max_pending_requests: 1

接下来，我们通过向服务发送单个请求来测试它是否有效，它会按预期可靠地响应 200。

但是，如果我们现在向服务发送 2 个异步请求，我们会看到意想不到的结果。它有时 returns 200 对于它不应该能够的两个请求，因为第二个请求应该使断路器跳闸。在其他情况下，一个请求 return 是 200，另一个 return 是 503 Service Unavailable，这是我们期望发生的情况。尽管我们尽了最大努力，但我们无法实现任何类型的可重复性，这让我们认为这与 Envoy 的底层并发性有关。

当我们将 max_connections 和 max_pending_requests 更改为更大的数字 (>100) 并再次发送太多请求以试图使电路跳闸时，我们发现这种不一致仍然存在。允许的请求数量大致正确，但有时会出现一些偏差。

我们希望了解这种缺乏绝对决定论的原因。任何帮助深表感谢！代码见 repo

编辑：有一个 issue 详细描述了类似的意外行为，但我离找到解决方案还差得很远。

我已经包含了两个请求的日志来演示输出：

同时发送 3 个请求，1 个通过。

❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
**    Trying ::1...
  Trying ::1...
**  TCP_NODELAY set
TCP_NODELAY set
*   Trying ::1...
* TCP_NODELAY set
* Connected to localhost (::1) port 3000 (#0)
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
>>  GET /?status=200&sleep=2 HTTP/1.1
Host: localhost:3000
>>  Host: localhost:3000
User-Agent: curl/7.64.1
>>  User-Agent: curl/7.64.1
Accept: */*
>>  Accept: */*

>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 503 Service Unavailable
< content-length: 81
< content-type: text/plain
< x-envoy-overloaded: true
< date: Wed, 12 Feb 2020 03:36:29 GMT
< server: envoy
<
* Connection #0 to host localhost left intact
upstream connect error or disconnect/reset before headers. reset reason: overflow* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:36:31 GMT
< x-envoy-upstream-service-time: 2007
<
* Connection #0 to host localhost left intact
200* Closing connection 0

同时发送 3 个请求，全部 return 200。

❯ (printf '%s\n' {1..3}) | xargs -I % -P 20 curl -v "http://localhost:3000?status=200&sleep=2"
**    Trying ::1...
  Trying ::1...
**  TCP_NODELAY set
TCP_NODELAY set
* *  Trying ::1...
 *Connected to localhost (::1) port 3000 (#0)
*  TCP_NODELAY set
Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> >Host: localhost:3000
 >GET /?status=200&sleep=2 HTTP/1.1
 User-Agent: curl/7.64.1
>>  Accept: */*
Host: localhost:3000
> >
 User-Agent: curl/7.64.1
> Accept: */*
>
* Connected to localhost (::1) port 3000 (#0)
> GET /?status=200&sleep=2 HTTP/1.1
> Host: localhost:3000
> User-Agent: curl/7.64.1
> Accept: */*
>
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:50 GMT
< x-envoy-upstream-service-time: 2006
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:52 GMT
< x-envoy-upstream-service-time: 4011
<
* Connection #0 to host localhost left intact
200* Closing connection 0
< HTTP/1.1 200 OK
< content-type: text/html; charset=utf-8
< content-length: 3
< server: envoy
< date: Wed, 12 Feb 2020 03:40:54 GMT
< x-envoy-upstream-service-time: 6015
<
* Connection #0 to host localhost left intact
200* Closing connection 0

Answer 1

来自 here 的一位贡献者：

The circuit breakers are intended to prevent too much load from propagating through the system, not enforce a strict limit. The system is implemented in a way that is simpler and more performant, but can slightly exceed the limits in some cases. Here's a comment from the implementation of the circuit breaker limit tracking

特使熔断非确定性行为

Envoy Circuit Breaking Non Deterministic Behaviour

concurrency

networking

circuit-breaker

envoyproxy