EMR Activity 的数据管道失败

Data Pipeline failing for EMR Activity

我正在尝试 运行 AWS 数据管道上的火花步骤。我收到以下异常:-

amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:748) Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException: EMR job '@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' with jobFlowId 'j-2E7PU1OK3GIJI' is failed with status 'FAILED' and reason 'Cluster ready after last step completed.'. Step 'df-0693981356F3KEDFQ6GG_@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' is in status 'FAILED' with reason 'null' at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more

集群正在正确启动。

这是管道的屏幕截图:-

我认为 activity 中的 'step' 存在一些问题。任何输入都会有所帮助。

问题是:- 1) 脚本应该以逗号分隔。像这样的东西:-

command-runner.jar,spark-submit,--deploy-mode,cluster,--class,com.amazon.Main

Link:- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html

2) EmrActivity 不支持暂存。所以,我们不能在step指令中使用${INPUT1_STAGING_DIR}。目前,我已将其替换为硬编码的 S3 URL。