aws 胶水步骤函数中的作业依赖性

aws glue job dependency in step function

我创建了 2 个粘合作业(gluejob1、gluejob2)。

我想创建一个依赖项,因为 gluejob2 应该 运行 只有在 gluejob1 完成之后。

为了协调这一点,我创建了一个具有以下定义的阶跃函数:

 {
  "gluejob1": {
    "Type": "Task",
    "Resource": "gluejob1.Arn",
    "Comment": "Glue job1.",
    "Next": "gluejob2"
  },

  "gluejob2": {
    "Type": "Task",
    "Resource": "gluejob2.Arn",
    "Comment": "TGlue job2.",
    "Next": "Gluejob2 Finished Loading"
  },
  "Gluejob2 Finished Loading": {
    "Type": "Pass",
    "Result": "",
    "End": true
  }
}

当我执行这个 step 函数时,状态函数 在它触发 Gluejob1 并继续触发 gluejob2 时称它为成功。

我想知道是否有可能仅在 gluejob1 完成后 运行 gluejob2。

为什么不在 glue 中使用触发器来处理依赖关系?

您可以invoke Glue job from StepFunction synchronously这样它将等待作业完成:

{
  "StartAt": "gluejob1",
  "States": {
    "gluejob1": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun.sync",
      "Parameters": {
        "JobName.$": "ETLJobName1"
      },
      "Next": "gluejob2"
    },
    "gluejob2": {
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun.sync",
      "Parameters": {
        "JobName.$": "ETLJobName2"
      },
      "Next": "Gluejob2 Finished Loading"
    },
    "Gluejob2 Finished Loading": {
      "Type": "Pass",
      "Result": "",
      "End": true
    }
}