通过 Step Functions 在 EMR 集群配置中设置子网 ID 和 EC2 密钥名称

Set Subnet ID and EC2 Key Name in EMR Cluster Config via Step Functions

截至 2019 年 11 月,AWS Step Function 原生支持编排 EMR 集群。因此,我们正在尝试配置一个集群并在其上 运行 一些作业。

我们找不到任何关于如何设置 SubnetId 以及用于集群中 EC2 实例的密钥名称的文档。有没有这种可能?

截至目前,我们的创建集群步骤如下所示:

"States": {
          "Create an EMR cluster": {
            "Type": "Task",
            "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
            "Parameters": {
              "Name": "TestCluster",
              "VisibleToAllUsers": true,
              "ReleaseLabel": "emr-5.26.0",
              "Applications": [
                { "Name": "spark" }
              ],
              "ServiceRole": "SomeRole",
              "JobFlowRole": "SomeInstanceProfile",
              "LogUri": "s3://some-logs-bucket/logs",
              "Instances": {
                "KeepJobFlowAliveWhenNoSteps": true,
                "InstanceFleets": [
                  {
                    "Name": "MasterFleet",
                    "InstanceFleetType": "MASTER",
                    "TargetOnDemandCapacity": 1,
                    "InstanceTypeConfigs": [
                      {
                        "InstanceType": "m3.2xlarge"
                      }
                    ]
                  },
                  {
                    "Name": "CoreFleet",
                    "InstanceFleetType": "CORE",
                    "TargetSpotCapacity": 2,
                    "InstanceTypeConfigs": [
                      {
                        "InstanceType": "m3.2xlarge",
                        "BidPriceAsPercentageOfOnDemandPrice": 100                         }
                    ]
                  }
                ]
              }
            },
            "ResultPath": "$.cluster",
            "End": "true"
          }
}

一旦我们尝试在参数中的任何子对象或参数本身中添加 "SubnetId" 键,我们就会收到错误:

Invalid State Machine Definition: 'SCHEMA_VALIDATION_FAILED: The field "SubnetId" is not supported by Step Functions at /States/Create an EMR cluster/Parameters' (Service: AWSStepFunctions; Status Code: 400; Error Code: InvalidDefinition;

参考 emr integration we can see that createCluster.sync uses the emr API RunJobFlow 上的 SF 文档。在 RunJobFlow 中,我们可以指定位于路径 $.Instances.Ec2KeyName 和 $.Instances.Ec2SubnetId 的 Ec2KeyName 和 Ec2SubnetId。

话虽如此,我设法创建了一个具有以下定义的状态机(附带说明,您的定义有语法错误"End":"true",应该是"End": 真)

{
"Comment": "A Hello World example of the Amazon States Language using Pass states",
"StartAt": "Create an EMR cluster",
"States": {
    "Create an EMR cluster": {
        "Type": "Task",
        "Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
        "Parameters": {
            "Name": "TestCluster",
            "VisibleToAllUsers": true,
            "ReleaseLabel": "emr-5.26.0",
            "Applications": [
                {
                    "Name": "spark"
                }
            ],
            "ServiceRole": "SomeRole",
            "JobFlowRole": "SomeInstanceProfile",
            "LogUri": "s3://some-logs-bucket/logs",
            "Instances": {
                "Ec2KeyName": "ENTER_EC2KEYNAME_HERE",
                "Ec2SubnetId": "ENTER_EC2SUBNETID_HERE",
                "KeepJobFlowAliveWhenNoSteps": true,
                "InstanceFleets": [
                    {
                        "Name": "MasterFleet",
                        "InstanceFleetType": "MASTER",
                        "TargetOnDemandCapacity": 1,
                        "InstanceTypeConfigs": [
                            {
                                "InstanceType": "m3.2xlarge"
                            }
                        ]
                    },
                    {
                        "Name": "CoreFleet",
                        "InstanceFleetType": "CORE",
                        "TargetSpotCapacity": 2,
                        "InstanceTypeConfigs": [
                            {
                                "InstanceType": "m3.2xlarge",
                                "BidPriceAsPercentageOfOnDemandPrice": 100
                            }
                        ]
                    }
                ]
            }
        },
        "ResultPath": "$.cluster",
        "End": true
    }
}
}