使用 Polly.Net 的嵌套重试和断路器策略的意外行为

Question

我编写了一个基于重试的弹性策略和一个断路器策略。现在可以使用，但它的行为有问题。

我注意到当断路器打开时 half-open，并且再次执行 onBreak() 事件以关闭电路，为重试策略触发了一次额外的重试（除了这一次the health verification 的 half-open 状态）。

让我一步步解释：

我已经为重试和断路器定义了两个强类型策略：

static Policy<HttpResponseMessage> customRetryPolicy;
static Policy<HttpResponseMessage> customCircuitBreakerPolicy;

static HttpStatusCode[] httpStatusesToProcess = new HttpStatusCode[]
{
   HttpStatusCode.ServiceUnavailable,  //503
   HttpStatusCode.InternalServerError, //500
};

重试政策以这种方式运作：每个请求重试两 (2) 次，每次重试之间等待五 (5) 秒。如果内部断路器断开，不得重试。仅重试 500 和 503 Http 状态。

customRetryPolicy = Policy<HttpResponseMessage>   

//Not execute a retry if the circuit is open
.Handle<BrokenCircuitException>( x => 
{
    return !(x is BrokenCircuitException);
})

//Stop if some inner exception match with BrokenCircuitException
.OrInner<AggregateException>(x =>
{
    return !(x.InnerException is BrokenCircuitException);
})

//Retry if status are:
.OrResult(x => { return httpStatusesToProcess.Contains(x.StatusCode); })

// Retry request two times, wait 5 seconds between each retry
.WaitAndRetry( 2, retryAttempt => TimeSpan.FromSeconds(5),
    (exception, timeSpan, retryCount, context) =>
    {
        System.Console.WriteLine("Retrying... " + retryCount);
    }
);

断路器策略以这种方式工作：允许连续最多三 (3) 次故障，然后打开电路三十 (30) 秒。开路仅适用于 HTTP-500。

customCircuitBreakerPolicy = Policy<HttpResponseMessage>

// handling result or exception to execute onBreak delegate
.Handle<AggregateException>(x => 
    { return x.InnerException is HttpRequestException; })

// just break when server error will be InternalServerError
.OrResult(x => { return (int) x.StatusCode == 500; })

// Broken when fail 3 times in a row,
// and hold circuit open for 30 seconds
.CircuitBreaker(3, TimeSpan.FromSeconds(30),
    onBreak: (lastResponse, breakDelay) =>{
        System.Console.WriteLine("\n Circuit broken!");
    },
    onReset: () => {
        System.Console.WriteLine("\n Circuit Reset!");
    },
    onHalfOpen: () => {
        System.Console.WriteLine("\n Circuit is Half-Open");
    });

最后，这两个政策以这种方式嵌套：

try
{
    customRetryPolicy.Execute(() =>
    customCircuitBreakerPolicy.Execute(() => {
       
       //for testing purposes "api/values", is returning 500 all time
        HttpResponseMessage msResponse
            = GetHttpResponseAsync("api/values").Result;
        
        // This just print messages on console, no pay attention
        PrintHttpResponseAsync(msResponse); 
        
        return msResponse;

   }));
}
catch (BrokenCircuitException e)
{
    System.Console.WriteLine("CB Error: " + e.Message);
}

我期望的结果是什么？

第一个服务器响应是 HTTP-500（符合预期）
重试 #1，失败（如预期）
重试 #2，失败（如预期）
因为我们有三个故障，断路器现在打开（如预期的那样）
太棒了！一切正常！
断路器在接下来的三十 (30) 秒内打开（如预期）
30 秒后，断路器半开（如预期）
一次尝试检查端点运行状况（如预期）
服务器响应是 HTTP-500（符合预期）
断路器在接下来的三十 (30) 秒内打开（如预期）
这里的问题：当断路器已经打开时，会启动额外的重试！

看图片：

我正在尝试理解这种行为。为什么当断路器分闸第二次、第三次、...、N 次时还要执行一次额外的重试？

我已经查看了重试的机器状态模型和断路器策略，但我不明白为什么要执行此额外的重试。

断路器的流程： https://github.com/App-vNext/Polly/wiki/Circuit-Breaker#putting-it-all-together-

重试策略的流程： https://github.com/App-vNext/Polly/wiki/Retry#how-polly-retry-works

这很重要，因为正在等待重试时间（本例为 5 秒），最后，这对于高并发来说是浪费时间。

如有任何帮助/指导，我们将不胜感激。非常感谢。

Answer 1

使用 Polly.Context，您可以在两个策略之间交换信息（在您的情况下：重试和断路器）。上下文基本上是 Dictionary<string, object>.

所以，诀窍是在 onBreak 上设置一个键，然后在 sleepDurationProdiver.

中使用该值

让我们从内部断路器策略开始：

static IAsyncPolicy<HttpResponseMessage> GetCircuitBreakerPolicy()
{
    return Policy<HttpResponseMessage>
        .HandleResult(res => res.StatusCode == HttpStatusCode.InternalServerError)
        .CircuitBreakerAsync(3, TimeSpan.FromSeconds(2),
           onBreak: (dr, ts, ctx) => { ctx[SleepDurationKey] = ts; },
           onReset: (ctx) => { ctx[SleepDurationKey] = null; });
}

它在 3 个后续失败请求后中断
在HalfOpen

Open

它在具有 durationOfBreak 值
当 CB 回到“正常”Closed 状态 (onReset) 时，它会删除此值

现在，让我们继续重试策略：

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    return Policy<HttpResponseMessage>
    .HandleResult(res => res.StatusCode == HttpStatusCode.InternalServerError)
    .Or<BrokenCircuitException>()
    .WaitAndRetryAsync(4,
        sleepDurationProvider: (c, ctx) =>
        {
            if (ctx.ContainsKey(SleepDurationKey))
                return (TimeSpan)ctx[SleepDurationKey];
            return TimeSpan.FromMilliseconds(200);
        },
        onRetry: (dr, ts, ctx) =>
        {
            Console.WriteLine($"Context: {(ctx.ContainsKey(SleepDurationKey) ? "Open" : "Closed")}");
            Console.WriteLine($"Waits: {ts.TotalMilliseconds}");
        });
}

当 StatusCode 为 500 时触发
- 或者当有 BrokenCircuitException
它最多触发 4 次（所以，总共 5 次尝试）
它根据上下文设置睡眠持续时间
- 如果密钥不存在于上下文中（CB 处于 Open 状态）那么它 returns 需要 200 毫秒
- 如果密钥存在于上下文中（CB 不处于 Open 状态）则它 returns 具有来自上下文的值
  - 注意：您可以为此值增加几百毫秒以避免竞争条件
它将一些值打印到内部控制台 onRetry 仅用于调试目的

最后让我们连接策略并测试它

const string SleepDurationKey = "Broken"; 
static HttpClient client = new HttpClient();
static async Task Main()
{
    var strategy = Policy.WrapAsync(GetRetryPolicy(), GetCircuitBreakerPolicy());
    await strategy.ExecuteAsync(async () => await Get());
}

static Task<HttpResponseMessage> Get()
{
    return client.GetAsync("https://httpstat.us/500");
}

它使用 http://httpstat.us 网站来模拟过载的下游
它combines/chains两个策略（CB inner，Retry outer）
它以异步方式调用Get方法

当 `handledEventsAllowedBeforeBreaking` 为 2

输出

Context: Closed
Waits: 200
Context: Open
Waits: 2000
Context: Open
Waits: 2000
Context: Open
Waits: 2000

当 `handledEventsAllowedBeforeBreaking` 为 3

输出

Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Open
Waits: 2000
Context: Open
Waits: 2000

当 `handledEventsAllowedBeforeBreaking` 为 4

输出

Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Closed
Waits: 200
Context: Open
Waits: 2000

Answer 2

我使用 polly Async Demo06_WaitAndRetryNesting CircuitBreaker.cs 提供的示例尝试了相同的场景。我在 Polly-Samples/Polly TestClient/Samples/.

中找到了它

在此处查看示例： Official Samples provided by Polly

执行确认我这种行为不仅仅发生在我提供的样本上。

在此确认之后，我在重试策略评估中添加了一个附加条件，以根据断路器状态进行重试。这有效！

注意.OrResult()委托

的新条件

 customRetryPolicy = Policy<HttpResponseMessage>   

 .Handle<BrokenCircuitException>( 
    x => { return !(x is BrokenCircuitException); })

 .OrInner<AggregateException>(
    x => { return !(x.InnerException is BrokenCircuitException); })

//Retry if HTTP-Status, and Circuit Breake status are:
.OrResult(x => { 
    return httpStatusesToProcess.Contains(x.StatusCode) 

    //This condition evaluate the current state for the 
    //circuit-breaker before each retry
        && ((CircuitBreakerPolicy<HttpResponseMessage>) 
        customCircuitBreakerPolicy).CircuitState == CircuitState.Closed
    ;
})
.WaitAndRetry( 2, retryAttempt => TimeSpan.FromSeconds(1),
    (exception, timeSpan, retryCount, context) =>
    {
        System.Console.WriteLine("Retrying... " + retryCount);
    }
);

这是结果：

我的假设是重试策略（最外层的策略）能够控制是否重试，检查断路器状态。这种情况发生了，但出于某种原因，当断路器为 half-open 时，健康请求在从 half-open 到 closed 的转换期间以否定响应执行，惩罚执行一次尝试（就像@peter-csala 之前说的那样）。

发生这种情况时，我被迫评估断路器状态。但我认为 Polly 应该自己执行此操作。

使用 Polly.Net 的嵌套重试和断路器策略的意外行为

Unexpected behaviour using nested Retry, and Circuit Breaker policies of Polly.Net

c#

.net-core

polly

asp.net-core

当 `handledEventsAllowedBeforeBreaking` 为 2

当 `handledEventsAllowedBeforeBreaking` 为 3

当 `handledEventsAllowedBeforeBreaking` 为 4

使用 Polly.Net 的嵌套重试和断路器策略的意外行为

Unexpected behaviour using nested Retry, and Circuit Breaker policies of Polly.Net

c#

.net-core

polly

asp.net-core

当 handledEventsAllowedBeforeBreaking 为 2

当 handledEventsAllowedBeforeBreaking 为 3

当 handledEventsAllowedBeforeBreaking 为 4

当 `handledEventsAllowedBeforeBreaking` 为 2

当 `handledEventsAllowedBeforeBreaking` 为 3

当 `handledEventsAllowedBeforeBreaking` 为 4