AWS Aurora 无服务器 - 通信 Link 失败
AWS Aurora Serverless - Communication Link Failure
我在我的 python 代码中使用 MySQL Aurora Serverless 集群(启用了数据 API),我收到 communications link failure
异常。这通常发生在集群休眠一段时间后。
但是,一旦集群处于活动状态,我就不会收到任何错误消息。我每次都要发送3-4个请求才能正常工作。
异常详情:
The last packet sent successfully to the server was 0 milliseconds
ago. The driver has not received any packets from the server. An error
occurred (BadRequestException) when calling the ExecuteStatement
operation: Communications link failure
我该如何解决这个问题?我正在使用标准的 boto3 库
这是来自 AWS 高级业务支持的回复。
Summary: It is an expected behavior
详细答案:
I can see that you receive this error when your Aurora Serverless
instance is inactive and you stop receiving it once your instance is
active and accepting connection. Please note that this is an expected
behavior. In general, Aurora Serverless works differently than
Provisioned Aurora , In Aurora Serverless, while the cluster is
"dormant" it has no compute resources assigned to it and when a db.
connection is received, Compute resources are assigned. Because of
this behavior, you will have to "wake up" the clusters and it may take
a few minutes for the first connection to succeed as you have seen.
In order to avoid that you may consider increasing the timeout on the
client side. Also, if you have enabled Pause, you may consider
disabling it [2]. After disabling Pause, you can also adjust the
minimum Aurora capacity unit to higher value to make sure that your
Cluster always having enough computing resource to serve the new
connections [3]. Please note that adjusting the minimum ACU might
increase the cost of service [4].
Also note that Aurora Serverless is only recommend for certain
workloads [5]. If your workload is highly predictable and your
application needs to access the DB on a regular basis, I would
recommend you use Provisioned Aurora cluster/instance to insure high
availability of your business.
[2] Aurora Serverless 的工作原理 - Aurora Serverless 的自动暂停和恢复 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html#aurora-serverless.how-it-works.pause-resume
[3] 设置 Aurora Serverless 数据库集群的容量 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.setting-capacity.html
[4] Aurora 无服务器价格 https://aws.amazon.com/rds/aurora/serverless/
[5] 使用 Amazon Aurora Serverless - Aurora Serverless 使用案例 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.use-cases
如果它对某人有用,这就是我在 Aurora Serverless 唤醒时管理重试的方式。
客户端 returns BadRequestException,因此即使您更改客户端的配置,boto3 也不会重试,请参阅 https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html。
我的第一个选择是尝试使用 Waiter,但 RDSData 没有任何 Waiter,然后我尝试创建一个带有错误匹配器的自定义 Waiter,但只尝试匹配错误代码,忽略消息,因为 BadRequestException 可能是由 sql 语句中的错误引起我也需要验证消息,所以我使用了一种服务员函数:
def _wait_for_serverless():
delay = 5
max_attempts = 10
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
rds_data.execute_statement(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN,
sql_statement='SELECT * FROM dummy'
)
return
except ClientError as ce:
error_code = ce.response.get("Error").get('Code')
error_msg = ce.response.get("Error").get('Message')
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping ' + str(delay) + ' secs, waiting RDS connection')
time.sleep(delay)
else:
raise ce
raise Exception('Waited for RDS Data but still getting error')
我是这样使用的:
def begin_rds_transaction():
_wait_for_serverless()
return rds_data.begin_transaction(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN
)
我也遇到了这个问题,从 Arless 使用的解决方案以及与 Jimbo 的谈话中得到启发,提出了以下解决方法。
我定义了一个装饰器,它会重试无服务器 RDS 请求,直到可配置的重试持续时间到期。
import logging
import functools
from sqlalchemy import exc
import time
logger = logging.getLogger()
def retry_if_db_inactive(max_attempts, initial_interval, backoff_rate):
"""
Retry the function if the serverless DB is still in the process of 'waking up'.
The configration retries follows the same concepts as AWS Step Function retries.
:param max_attempts: The maximum number of retry attempts
:param initial_interval: The initial duration to wait (in seconds) when the first 'Communications link failure' error is encountered
:param backoff_rate: The factor to use to multiply the previous interval duration, for the next interval
:return:
"""
def decorate_retry_if_db_inactive(func):
@functools.wraps(func)
def wrapper_retry_if_inactive(*args, **kwargs):
interval_secs = initial_interval
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
return func(*args, **kwargs)
except exc.StatementError as err:
if hasattr(err.orig, 'response'):
error_code = err.orig.response["Error"]['Code']
error_msg = err.orig.response["Error"]['Message']
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping for ' + str(interval_secs) + ' secs, awaiting RDS connection')
time.sleep(interval_secs)
interval_secs = interval_secs * backoff_rate
else:
raise err
else:
raise err
raise Exception('Waited for RDS Data but still getting error')
return wrapper_retry_if_inactive
return decorate_retry_if_db_inactive
然后可以这样使用:
@retry_if_db_inactive(max_attempts=4, initial_interval=10, backoff_rate=2)
def insert_alert_to_db(sqs_alert):
with db_session_scope() as session:
# your db code
session.add(sqs_alert)
return None
请注意,我使用的是 sqlalchemy,因此需要调整代码以适应特定目的,但希望作为入门者会有用。
我在我的 python 代码中使用 MySQL Aurora Serverless 集群(启用了数据 API),我收到 communications link failure
异常。这通常发生在集群休眠一段时间后。
但是,一旦集群处于活动状态,我就不会收到任何错误消息。我每次都要发送3-4个请求才能正常工作。
异常详情:
The last packet sent successfully to the server was 0 milliseconds ago. The driver has not received any packets from the server. An error occurred (BadRequestException) when calling the ExecuteStatement operation: Communications link failure
我该如何解决这个问题?我正在使用标准的 boto3 库
这是来自 AWS 高级业务支持的回复。
Summary: It is an expected behavior
详细答案:
I can see that you receive this error when your Aurora Serverless instance is inactive and you stop receiving it once your instance is active and accepting connection. Please note that this is an expected behavior. In general, Aurora Serverless works differently than Provisioned Aurora , In Aurora Serverless, while the cluster is "dormant" it has no compute resources assigned to it and when a db. connection is received, Compute resources are assigned. Because of this behavior, you will have to "wake up" the clusters and it may take a few minutes for the first connection to succeed as you have seen.
In order to avoid that you may consider increasing the timeout on the client side. Also, if you have enabled Pause, you may consider disabling it [2]. After disabling Pause, you can also adjust the minimum Aurora capacity unit to higher value to make sure that your Cluster always having enough computing resource to serve the new connections [3]. Please note that adjusting the minimum ACU might increase the cost of service [4].
Also note that Aurora Serverless is only recommend for certain workloads [5]. If your workload is highly predictable and your application needs to access the DB on a regular basis, I would recommend you use Provisioned Aurora cluster/instance to insure high availability of your business.
[2] Aurora Serverless 的工作原理 - Aurora Serverless 的自动暂停和恢复 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.how-it-works.html#aurora-serverless.how-it-works.pause-resume
[3] 设置 Aurora Serverless 数据库集群的容量 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.setting-capacity.html
[4] Aurora 无服务器价格 https://aws.amazon.com/rds/aurora/serverless/
[5] 使用 Amazon Aurora Serverless - Aurora Serverless 使用案例 - https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/aurora-serverless.html#aurora-serverless.use-cases
如果它对某人有用,这就是我在 Aurora Serverless 唤醒时管理重试的方式。
客户端 returns BadRequestException,因此即使您更改客户端的配置,boto3 也不会重试,请参阅 https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html。
我的第一个选择是尝试使用 Waiter,但 RDSData 没有任何 Waiter,然后我尝试创建一个带有错误匹配器的自定义 Waiter,但只尝试匹配错误代码,忽略消息,因为 BadRequestException 可能是由 sql 语句中的错误引起我也需要验证消息,所以我使用了一种服务员函数:
def _wait_for_serverless():
delay = 5
max_attempts = 10
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
rds_data.execute_statement(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN,
sql_statement='SELECT * FROM dummy'
)
return
except ClientError as ce:
error_code = ce.response.get("Error").get('Code')
error_msg = ce.response.get("Error").get('Message')
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping ' + str(delay) + ' secs, waiting RDS connection')
time.sleep(delay)
else:
raise ce
raise Exception('Waited for RDS Data but still getting error')
我是这样使用的:
def begin_rds_transaction():
_wait_for_serverless()
return rds_data.begin_transaction(
database=DB_NAME,
resourceArn=CLUSTER_ARN,
secretArn=SECRET_ARN
)
我也遇到了这个问题,从 Arless 使用的解决方案以及与 Jimbo 的谈话中得到启发,提出了以下解决方法。
我定义了一个装饰器,它会重试无服务器 RDS 请求,直到可配置的重试持续时间到期。
import logging
import functools
from sqlalchemy import exc
import time
logger = logging.getLogger()
def retry_if_db_inactive(max_attempts, initial_interval, backoff_rate):
"""
Retry the function if the serverless DB is still in the process of 'waking up'.
The configration retries follows the same concepts as AWS Step Function retries.
:param max_attempts: The maximum number of retry attempts
:param initial_interval: The initial duration to wait (in seconds) when the first 'Communications link failure' error is encountered
:param backoff_rate: The factor to use to multiply the previous interval duration, for the next interval
:return:
"""
def decorate_retry_if_db_inactive(func):
@functools.wraps(func)
def wrapper_retry_if_inactive(*args, **kwargs):
interval_secs = initial_interval
attempt = 0
while attempt < max_attempts:
attempt += 1
try:
return func(*args, **kwargs)
except exc.StatementError as err:
if hasattr(err.orig, 'response'):
error_code = err.orig.response["Error"]['Code']
error_msg = err.orig.response["Error"]['Message']
# Aurora serverless is waking up
if error_code == 'BadRequestException' and 'Communications link failure' in error_msg:
logger.info('Sleeping for ' + str(interval_secs) + ' secs, awaiting RDS connection')
time.sleep(interval_secs)
interval_secs = interval_secs * backoff_rate
else:
raise err
else:
raise err
raise Exception('Waited for RDS Data but still getting error')
return wrapper_retry_if_inactive
return decorate_retry_if_db_inactive
然后可以这样使用:
@retry_if_db_inactive(max_attempts=4, initial_interval=10, backoff_rate=2)
def insert_alert_to_db(sqs_alert):
with db_session_scope() as session:
# your db code
session.add(sqs_alert)
return None
请注意,我使用的是 sqlalchemy,因此需要调整代码以适应特定目的,但希望作为入门者会有用。