Asp.net 核心健康检查随机失败,出现 TaskCanceledException 或 OperationCanceledException
Asp.net core healthchecks randomly fails with TaskCanceledException or OperationCanceledException
我已经在我的 asp.net 核心应用程序中实施了健康检查。
一项健康检查执行 2 项检查 - DbContext 连接和检查 NpgsqlConnection 的自定义一项。
在超过 99% 的情况下一切正常。有时,健康检查失败会抛出 TaskCanceledException 或 OperationCanceledException。从我的日志中我可以看到这个异常是在大约 2ms-25ms 后抛出的(因此不可能发生任何超时)。
重要提示:
当我多次点击 healtchecks(浏览器中的简单 F5)时,它抛出异常。在之前的健康检查完成之前,您似乎无法访问 /health 端点。如果是这种情况 - 为什么?即使我将 Thread.Sleep(5000);
放入自定义健康检查(根本没有数据库连接检查),如果我在 5 秒过去之前点击 /health
端点,它也会失败。
问题:healtheck 是否以某种方式 'magically' 是单线程的(当您再次访问该端点时,它会取消之前的 healthcheck 调用)?
Startup.cs 配置服务
services
.AddHealthChecks()
.AddCheck<StorageHealthCheck>("ReadOnly Persistance")
.AddDbContextCheck<MyDbContext>("EFCore persistance");
Startup.cs 配置
if (env.IsDevelopment())
{
app.UseDeveloperExceptionPage();
}
else
{
app.UseHsts();
}
app.UseHttpsRedirection();
app.UseCors(options => options.AllowAnyOrigin().AllowAnyMethod().AllowAnyHeader());
app.UseMiddleware<RequestLogMiddleware>();
app.UseMiddleware<ErrorLoggingMiddleware>();
if (!env.IsProduction())
{
app.UseSwagger();
app.UseSwaggerUI(c =>
{
c.SwaggerEndpoint("/swagger/v1/swagger.json", "V1");
c.SwaggerEndpoint($"/swagger/v2/swagger.json", $"V2");
});
}
app.UseHealthChecks("/health", new HealthCheckOptions()
{
ResponseWriter = WriteResponse
});
app.UseMvc();
StorageHealthCheck.cs
public class StorageHealthCheck : IHealthCheck
{
private readonly IMediator _mediator;
public StorageHealthCheck(IMediator mediator)
{
_mediator = mediator;
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default(CancellationToken))
{
var isReadOnlyHealthy = await _mediator.Send(new CheckReadOnlyPersistanceHealthQuery());
return new HealthCheckResult(isReadOnlyHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy, null);
}
}
CheckReadOnlyPersistanceHealthQueryHandler:
NpgsqlConnectionStringBuilder csb = new NpgsqlConnectionStringBuilder(_connectionString.Value);
string sql = $@"
SELECT * FROM pg_database WHERE datname = '{csb.Database}'";
try
{
using (IDbConnection connection = new NpgsqlConnection(_connectionString.Value))
{
connection.Open();
var stateAfterOpening = connection.State;
if (stateAfterOpening != ConnectionState.Open)
{
return false;
}
connection.Close();
return true;
}
}
catch
{
return false;
}
任务取消异常:
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Npgsql.TaskExtensions.WithCancellation[T](Task`1 task, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.ConnectAsync(NpgsqlTimeout timeout, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<>c__DisplayClass32_0.<<Open>g__OpenLong|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlDatabaseCreator.ExistsAsync(CancellationToken cancellationToken)
at Microsoft.Extensions.Diagnostics.HealthChecks.DbContextHealthCheck`1.CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)
OperationCanceledException:
System.OperationCanceledException: The operation was canceled.
at System.Threading.CancellationToken.ThrowOperationCanceledException()
at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)
我最好的理论是,在大型生产环境中进行测试后,您需要在运行状况检查中等待任何对 http 上下文输出流的写入。我在返回未等待的任务的方法中遇到此错误。等待任务似乎已经解决了问题。 await 的好处是你还可以抓到一个 TaskCancelledException
然后吃掉它。
示例:
// map health checks
endpoints.MapHealthChecks("/health-check", new HealthCheckOptions
{
ResponseWriter = HealthCheckExtensions.WriteJsonResponseAsync,
Predicate = check => check.Name == "default"
});
/// <summary>
/// Write a json health check response
/// </summary>
/// <param name="context">Http context</param>
/// <param name="report">Report</param>
/// <returns>Task</returns>
public static async Task WriteJsonResponseAsync(HttpContext context, HealthReport report)
{
try
{
HealthReportEntry entry = report.Entries.Values.FirstOrDefault();
context.Response.ContentType = "application/json; charset=utf-8";
await JsonSerializer.SerializeAsync(context.Response.Body, entry.Data,entry.Data.GetType());
}
catch (TaskCancelledException)
{
}
}
我终于找到了答案。
最初的原因是当HTTP请求中止时,然后httpContext.RequestAborted
CancellationToken被触发,并抛出异常(OperationCanceledException
)。
我的应用程序中有一个全局异常处理程序,我一直在将每个未处理的异常转换为 500
错误。即使客户端中止了请求,并且从未收到 500
响应,我的日志仍会记录此内容。
我实现的解决方案是这样的:
public async Task Invoke(HttpContext context)
{
try
{
await _next(context);
}
catch (Exception ex)
{
if (context.RequestAborted.IsCancellationRequested)
{
_logger.LogWarning(ex, "RequestAborted. " + ex.Message);
return;
}
_logger.LogCritical(ex, ex.Message);
await HandleExceptionAsync(context, ex);
throw;
}
}
private static Task HandleExceptionAsync(HttpContext context, Exception ex)
{
var code = HttpStatusCode.InternalServerError; // 500 if unexpected
//if (ex is MyNotFoundException) code = HttpStatusCode.NotFound;
//else if (ex is MyUnauthorizedException) code = HttpStatusCode.Unauthorized;
//else if (ex is MyException) code = HttpStatusCode.BadRequest;
var result = JsonConvert.SerializeObject(new { error = ex.Message });
context.Response.ContentType = "application/json";
context.Response.StatusCode = (int)code;
return context.Response.WriteAsync(result);
}
希望对大家有所帮助。
我已经在我的 asp.net 核心应用程序中实施了健康检查。 一项健康检查执行 2 项检查 - DbContext 连接和检查 NpgsqlConnection 的自定义一项。
在超过 99% 的情况下一切正常。有时,健康检查失败会抛出 TaskCanceledException 或 OperationCanceledException。从我的日志中我可以看到这个异常是在大约 2ms-25ms 后抛出的(因此不可能发生任何超时)。
重要提示:
当我多次点击 healtchecks(浏览器中的简单 F5)时,它抛出异常。在之前的健康检查完成之前,您似乎无法访问 /health 端点。如果是这种情况 - 为什么?即使我将 Thread.Sleep(5000);
放入自定义健康检查(根本没有数据库连接检查),如果我在 5 秒过去之前点击 /health
端点,它也会失败。
问题:healtheck 是否以某种方式 'magically' 是单线程的(当您再次访问该端点时,它会取消之前的 healthcheck 调用)?
Startup.cs 配置服务
services
.AddHealthChecks()
.AddCheck<StorageHealthCheck>("ReadOnly Persistance")
.AddDbContextCheck<MyDbContext>("EFCore persistance");
Startup.cs 配置
if (env.IsDevelopment())
{
app.UseDeveloperExceptionPage();
}
else
{
app.UseHsts();
}
app.UseHttpsRedirection();
app.UseCors(options => options.AllowAnyOrigin().AllowAnyMethod().AllowAnyHeader());
app.UseMiddleware<RequestLogMiddleware>();
app.UseMiddleware<ErrorLoggingMiddleware>();
if (!env.IsProduction())
{
app.UseSwagger();
app.UseSwaggerUI(c =>
{
c.SwaggerEndpoint("/swagger/v1/swagger.json", "V1");
c.SwaggerEndpoint($"/swagger/v2/swagger.json", $"V2");
});
}
app.UseHealthChecks("/health", new HealthCheckOptions()
{
ResponseWriter = WriteResponse
});
app.UseMvc();
StorageHealthCheck.cs
public class StorageHealthCheck : IHealthCheck
{
private readonly IMediator _mediator;
public StorageHealthCheck(IMediator mediator)
{
_mediator = mediator;
}
public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default(CancellationToken))
{
var isReadOnlyHealthy = await _mediator.Send(new CheckReadOnlyPersistanceHealthQuery());
return new HealthCheckResult(isReadOnlyHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy, null);
}
}
CheckReadOnlyPersistanceHealthQueryHandler:
NpgsqlConnectionStringBuilder csb = new NpgsqlConnectionStringBuilder(_connectionString.Value);
string sql = $@"
SELECT * FROM pg_database WHERE datname = '{csb.Database}'";
try
{
using (IDbConnection connection = new NpgsqlConnection(_connectionString.Value))
{
connection.Open();
var stateAfterOpening = connection.State;
if (stateAfterOpening != ConnectionState.Open)
{
return false;
}
connection.Close();
return true;
}
}
catch
{
return false;
}
任务取消异常:
System.Threading.Tasks.TaskCanceledException: A task was canceled.
at Npgsql.TaskExtensions.WithCancellation[T](Task`1 task, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.ConnectAsync(NpgsqlTimeout timeout, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
at Npgsql.NpgsqlConnection.<>c__DisplayClass32_0.<<Open>g__OpenLong|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlDatabaseCreator.ExistsAsync(CancellationToken cancellationToken)
at Microsoft.Extensions.Diagnostics.HealthChecks.DbContextHealthCheck`1.CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)
OperationCanceledException:
System.OperationCanceledException: The operation was canceled.
at System.Threading.CancellationToken.ThrowOperationCanceledException()
at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)
我最好的理论是,在大型生产环境中进行测试后,您需要在运行状况检查中等待任何对 http 上下文输出流的写入。我在返回未等待的任务的方法中遇到此错误。等待任务似乎已经解决了问题。 await 的好处是你还可以抓到一个 TaskCancelledException
然后吃掉它。
示例:
// map health checks
endpoints.MapHealthChecks("/health-check", new HealthCheckOptions
{
ResponseWriter = HealthCheckExtensions.WriteJsonResponseAsync,
Predicate = check => check.Name == "default"
});
/// <summary>
/// Write a json health check response
/// </summary>
/// <param name="context">Http context</param>
/// <param name="report">Report</param>
/// <returns>Task</returns>
public static async Task WriteJsonResponseAsync(HttpContext context, HealthReport report)
{
try
{
HealthReportEntry entry = report.Entries.Values.FirstOrDefault();
context.Response.ContentType = "application/json; charset=utf-8";
await JsonSerializer.SerializeAsync(context.Response.Body, entry.Data,entry.Data.GetType());
}
catch (TaskCancelledException)
{
}
}
我终于找到了答案。
最初的原因是当HTTP请求中止时,然后httpContext.RequestAborted
CancellationToken被触发,并抛出异常(OperationCanceledException
)。
我的应用程序中有一个全局异常处理程序,我一直在将每个未处理的异常转换为 500
错误。即使客户端中止了请求,并且从未收到 500
响应,我的日志仍会记录此内容。
我实现的解决方案是这样的:
public async Task Invoke(HttpContext context)
{
try
{
await _next(context);
}
catch (Exception ex)
{
if (context.RequestAborted.IsCancellationRequested)
{
_logger.LogWarning(ex, "RequestAborted. " + ex.Message);
return;
}
_logger.LogCritical(ex, ex.Message);
await HandleExceptionAsync(context, ex);
throw;
}
}
private static Task HandleExceptionAsync(HttpContext context, Exception ex)
{
var code = HttpStatusCode.InternalServerError; // 500 if unexpected
//if (ex is MyNotFoundException) code = HttpStatusCode.NotFound;
//else if (ex is MyUnauthorizedException) code = HttpStatusCode.Unauthorized;
//else if (ex is MyException) code = HttpStatusCode.BadRequest;
var result = JsonConvert.SerializeObject(new { error = ex.Message });
context.Response.ContentType = "application/json";
context.Response.StatusCode = (int)code;
return context.Response.WriteAsync(result);
}
希望对大家有所帮助。