如何从 Npgsql 异常判断调用是否值得重试(瞬态故障策略)
How to tell from Npgsql exception if the call is worth a retry (transient fault strategy)
我正在编写一个将连接到远程 postgres 服务器的服务。
我正在寻找一种好方法来确定哪些异常应该被视为暂时的(值得重试),以及如何定义连接到远程数据库的适当策略。
该服务正在使用 Npgsql 进行数据访问。
文档说 Npgsql 将抛出 sql 错误的 PostgresException 和 "server related issues".
的 NpgsqlException
到目前为止,我能想到的最好的办法是假设所有不是 PostgresExceptions 的异常都应该被视为可能是暂时的,值得重试,但是 PostgresException 意味着查询有问题并且重试无济于事。我的这个假设是否正确?
我正在使用 Polly 创建重试和断路器策略。
因此,我的政策是这样的:
Policy.Handle<Exception>( AllButPotgresExceptions()) // if its a postgres exception we know its not going to work even with a retry, so don't
.WaitAndRetryAsync(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4)
}, onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( AllButPotgresExceptions())
.AdvancedCircuitBreakerAsync(
failureThreshold:.7,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static Func<Exception, bool> AllButPotgresExceptions()
{
return ex => ex.GetType() != typeof(PostgresException);
}
是否有更好的方法来确定哪些错误可能是暂时的?
更新:
根据 Shay 的建议,我在 Npgsql 中打开了一个新问题并将我的政策更新为如下所示:
public static Policy PostresTransientFaultPolicy
{
get
{
return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.WaitAndRetryAsync(
retryCount: 10,
sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4),
onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.AdvancedCircuitBreakerAsync(
failureThreshold:.4,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
//TODO add random %20 variance on the exponent
return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}
private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
return (ex) =>
{
//if it is not a postgres exception we must assume it will be transient
if (ex.GetType() != typeof(PostgresException))
return true;
var pgex = ex as PostgresException;
switch (pgex.SqlState)
{
case "53000": //insufficient_resources
case "53100": //disk_full
case "53200": //out_of_memory
case "53300": //too_many_connections
case "53400": //configuration_limit_exceeded
case "57P03": //cannot_connect_now
case "58000": //system_error
case "58030": //io_error
//These next few I am not sure whether they should be treated as transient or not, but I am guessing so
case "55P03": //lock_not_available
case "55006": //object_in_use
case "55000": //object_not_in_prerequisite_state
case "08000": //connection_exception
case "08003": //connection_does_not_exist
case "08006": //connection_failure
case "08001": //sqlclient_unable_to_establish_sqlconnection
case "08004": //sqlserver_rejected_establishment_of_sqlconnection
case "08007": //transaction_resolution_unknown
return true;
}
return false;
};
}
你的方法很好。 NpgsqlException 通常意味着 network/IO 错误,尽管您可以检查内部异常并检查 IOException 以确定。
PostgreSQL报错时抛出PostgresException,多数情况下是查询的问题。但是,可能存在一些暂时的服务器端问题(例如连接过多),您可以检查 SQL 错误代码 - 请参阅 the PG docs.
向这些异常添加 IsTransient
属性 可能是个好主意,在 PostgreSQL 本身中编码这些检查 - 欢迎您为此打开一个问题在 Npgsql 仓库上。
我正在编写一个将连接到远程 postgres 服务器的服务。 我正在寻找一种好方法来确定哪些异常应该被视为暂时的(值得重试),以及如何定义连接到远程数据库的适当策略。
该服务正在使用 Npgsql 进行数据访问。 文档说 Npgsql 将抛出 sql 错误的 PostgresException 和 "server related issues".
的 NpgsqlException到目前为止,我能想到的最好的办法是假设所有不是 PostgresExceptions 的异常都应该被视为可能是暂时的,值得重试,但是 PostgresException 意味着查询有问题并且重试无济于事。我的这个假设是否正确?
我正在使用 Polly 创建重试和断路器策略。 因此,我的政策是这样的:
Policy.Handle<Exception>( AllButPotgresExceptions()) // if its a postgres exception we know its not going to work even with a retry, so don't
.WaitAndRetryAsync(new[]
{
TimeSpan.FromSeconds(1),
TimeSpan.FromSeconds(2),
TimeSpan.FromSeconds(4)
}, onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( AllButPotgresExceptions())
.AdvancedCircuitBreakerAsync(
failureThreshold:.7,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static Func<Exception, bool> AllButPotgresExceptions()
{
return ex => ex.GetType() != typeof(PostgresException);
}
是否有更好的方法来确定哪些错误可能是暂时的?
更新:
根据 Shay 的建议,我在 Npgsql 中打开了一个新问题并将我的政策更新为如下所示:
public static Policy PostresTransientFaultPolicy
{
get
{
return postgresTransientPolicy ?? (postgresTransientPolicy = Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.WaitAndRetryAsync(
retryCount: 10,
sleepDurationProvider: retryAttempt => ExponentialBackoff(retryAttempt, 1.4),
onRetry: (exception, span) => Log.Warning(exception, "Postgres Retry Failure: "))
.WrapAsync(
Policy.Handle<Exception>( PostgresDatabaseTransientErrorDetectionStrategy())
.AdvancedCircuitBreakerAsync(
failureThreshold:.4,
samplingDuration: TimeSpan.FromSeconds(30),
minimumThroughput: 20,
durationOfBreak: TimeSpan.FromSeconds(30),
onBreak: (ex, timeSpan, context) => Log.Warning(ex, "Postres Circuit Breaker Broken: "),
onReset: (context) => Log.Warning("Postres Circuit Breaker Reset: "),
onHalfOpen: () => Log.Warning("Postres Circuit Breaker Half Open: ")
)));
}
}
private static TimeSpan ExponentialBackoff(int retryAttempt, double exponent)
{
//TODO add random %20 variance on the exponent
return TimeSpan.FromSeconds(Math.Pow(retryAttempt, exponent));
}
private static Func<Exception, bool> PostgresDatabaseTransientErrorDetectionStrategy()
{
return (ex) =>
{
//if it is not a postgres exception we must assume it will be transient
if (ex.GetType() != typeof(PostgresException))
return true;
var pgex = ex as PostgresException;
switch (pgex.SqlState)
{
case "53000": //insufficient_resources
case "53100": //disk_full
case "53200": //out_of_memory
case "53300": //too_many_connections
case "53400": //configuration_limit_exceeded
case "57P03": //cannot_connect_now
case "58000": //system_error
case "58030": //io_error
//These next few I am not sure whether they should be treated as transient or not, but I am guessing so
case "55P03": //lock_not_available
case "55006": //object_in_use
case "55000": //object_not_in_prerequisite_state
case "08000": //connection_exception
case "08003": //connection_does_not_exist
case "08006": //connection_failure
case "08001": //sqlclient_unable_to_establish_sqlconnection
case "08004": //sqlserver_rejected_establishment_of_sqlconnection
case "08007": //transaction_resolution_unknown
return true;
}
return false;
};
}
你的方法很好。 NpgsqlException 通常意味着 network/IO 错误,尽管您可以检查内部异常并检查 IOException 以确定。
PostgreSQL报错时抛出PostgresException,多数情况下是查询的问题。但是,可能存在一些暂时的服务器端问题(例如连接过多),您可以检查 SQL 错误代码 - 请参阅 the PG docs.
向这些异常添加 IsTransient
属性 可能是个好主意,在 PostgreSQL 本身中编码这些检查 - 欢迎您为此打开一个问题在 Npgsql 仓库上。