AWS 步骤函数中的重试逻辑
Retry logic in AWS step function
我正在测试step函数的重试逻辑。
理论上,如果失败,应该重试以下步骤函数以执行 lambda 3 次。
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.All", "States.Timeout" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
它调用的 lambda 设置为 3 秒后超时。 lambda 冻结 4 秒。这意味着 lambda 超时并抛出 States.Timeout
错误。代码如下:
function sleep(ms){
return new Promise(resolve=>{
setTimeout(resolve,ms)
})
}
exports.handler = async (event) => {
console.log('------------> executing ....')
await sleep(4000)
};
问题是步骤函数不会重试任务。这可以从以下 CloudWatch
日志中得到证实。
05:59:36
START RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Version: $LATEST
05:59:36
2019-07-24T05:59:36.340Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 INFO ------------> executing ....
05:59:39
END RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0
05:59:39
REPORT RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Duration: 3003.29 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 26 MB
05:59:39
2019-07-24T05:59:39.317Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Task timed out after 3.00 seconds
不确定哪里出了问题。感谢任何帮助,提前致谢。
为了回答我自己的问题,我放置的重试逻辑有 2 个问题。
States.All
应该是States.ALL
(注意L的大小写)
- 当 lambda 超时时,抛出的错误是
Lambda.Unknown
而不是 States.Timeout
。
我用以下代码更新了我的步进函数,现在它可以工作了:
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.Timeout", "Lambda.Unknown" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
因为你没有在 ASL 中定义 TimeoutSeconds
。示例:
"Type": "Task",
"Resource": "${FunctionArn}",
"TimeoutSeconds": 3,
否则会抛出Lambda.Unknown
失败
我正在测试step函数的重试逻辑。 理论上,如果失败,应该重试以下步骤函数以执行 lambda 3 次。
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.All", "States.Timeout" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
它调用的 lambda 设置为 3 秒后超时。 lambda 冻结 4 秒。这意味着 lambda 超时并抛出 States.Timeout
错误。代码如下:
function sleep(ms){
return new Promise(resolve=>{
setTimeout(resolve,ms)
})
}
exports.handler = async (event) => {
console.log('------------> executing ....')
await sleep(4000)
};
问题是步骤函数不会重试任务。这可以从以下 CloudWatch
日志中得到证实。
05:59:36
START RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Version: $LATEST
05:59:36
2019-07-24T05:59:36.340Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 INFO ------------> executing ....
05:59:39
END RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0
05:59:39
REPORT RequestId: dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Duration: 3003.29 ms Billed Duration: 3000 ms Memory Size: 128 MB Max Memory Used: 26 MB
05:59:39
2019-07-24T05:59:39.317Z dd1a2ee9-f389-44be-aaa6-07f2ca7983b0 Task timed out after 3.00 seconds
不确定哪里出了问题。感谢任何帮助,提前致谢。
为了回答我自己的问题,我放置的重试逻辑有 2 个问题。
States.All
应该是States.ALL
(注意L的大小写)- 当 lambda 超时时,抛出的错误是
Lambda.Unknown
而不是States.Timeout
。
我用以下代码更新了我的步进函数,现在它可以工作了:
{
"StartAt": "Bazinga",
"States": {
"Bazinga": {
"Type": "Task",
"Resource": "arn:aws:states:::lambda:invoke",
"Parameters": {
"FunctionName": "arn:aws:lambda:ap-southeast-2:518815385770:function:errorTest:$LATEST",
"Payload": {
"Input.$": "$"
}
},
"Retry" : [
{
"ErrorEquals": [ "States.Timeout", "Lambda.Unknown" ],
"IntervalSeconds": 1,
"MaxAttempts": 3,
"BackoffRate": 1.0
}
],
"Next": "Fail"
},
"Fail": {
"Type": "Fail"
}
}
}
因为你没有在 ASL 中定义 TimeoutSeconds
。示例:
"Type": "Task",
"Resource": "${FunctionArn}",
"TimeoutSeconds": 3,
否则会抛出Lambda.Unknown
失败