使用 java 代码将 spark 作业提交到 AWS EMR 并等待执行并获得最终状态

Submit spark job to AWS EMR with java code and wait for the execution and get final status

我正在尝试通过 AWS EMR SDK API 将 Spark 作业提交到 AWS EMR。 我希望进程提交作业,然后等待作业 complete/fail 并获得相应的状态。

代码:

    AWSCredentials credentials = new BasicAWSCredentials(accessKey, secretKey);
AmazonElasticMapReduce emr =
        AmazonElasticMapReduceClientBuilder
                .standard()
                .withCredentials(new AWSStaticCredentialsProvider(credentials))
                .build();

HadoopJarStepConfig sparkStepConf =
        new HadoopJarStepConfig()
                .withJar("command-runner.jar")
                .withArgs("spark-submit")
                .withArgs("--master", "yarn")
                .withArgs(sparkJarPath)
                .withArgs(args);

StepConfig sparkStep =
        new StepConfig().withName("Spark Step").withActionOnFailure(ActionOnFailure.CONTINUE).withHadoopJarStep(
                sparkStepConf);

AddJobFlowStepsRequest req =
        new AddJobFlowStepsRequest().withJobFlowId(clusterId).withSteps(Collections.singletonList(sparkStep));
emr.addJobFlowSteps(req);

找不到可获取已提交作业状态的内容

这是一个例子(请检查某些代码区域是否为空):

ListStepsResult stepsResult = emr.listSteps(new ListStepsRequest().withClusterId(clusterId).withStepIds(req.getStepIds()));
List<StepSummary> stepsList = stepsResult.getSteps();
StepSummary stepSummary = stepsList.get(0);
StepStatus stepSummaryStatus = stepSummary.getStatus();
String stepStatus = stepSummaryStatus.getState();
StepExecutionState stepState = StepExecutionState.valueOf(stepStatus);

stepState有你想要的。