EMR Activity 的数据管道失败
Data Pipeline failing for EMR Activity
我正在尝试 运行 AWS 数据管道上的火花步骤。我收到以下异常:-
amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to
complete EMR transform. at
amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67)
at
amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16)
at
amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136)
at
amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105)
at
amazonaws.datapipeline.taskrunner.TaskPoller.run(TaskPoller.java:81)
at
private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76)
at
private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53)
at java.lang.Thread.run(Thread.java:748) Caused by:
amazonaws.datapipeline.taskrunner.TaskExecutionException: EMR job
'@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' with jobFlowId
'j-2E7PU1OK3GIJI' is failed with status 'FAILED' and reason 'Cluster
ready after last step completed.'. Step
'df-0693981356F3KEDFQ6GG_@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1'
is in status 'FAILED' with reason 'null' at
amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at
amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63)
... 7 more
集群正在正确启动。
这是管道的屏幕截图:-
我认为 activity 中的 'step' 存在一些问题。任何输入都会有所帮助。
问题是:-
1) 脚本应该以逗号分隔。像这样的东西:-
command-runner.jar,spark-submit,--deploy-mode,cluster,--class,com.amazon.Main
Link:- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html
2) EmrActivity 不支持暂存。所以,我们不能在step指令中使用${INPUT1_STAGING_DIR}
。目前,我已将其替换为硬编码的 S3 URL。
我正在尝试 运行 AWS 数据管道上的火花步骤。我收到以下异常:-
amazonaws.datapipeline.taskrunner.TaskExecutionException: Failed to complete EMR transform. at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:67) at amazonaws.datapipeline.objects.AbstractActivity.run(AbstractActivity.java:16) at amazonaws.datapipeline.taskrunner.TaskPoller.executeRemoteRunner(TaskPoller.java:136) at amazonaws.datapipeline.taskrunner.TaskPoller.executeTask(TaskPoller.java:105) at amazonaws.datapipeline.taskrunner.TaskPoller.run(TaskPoller.java:81) at private.com.amazonaws.services.datapipeline.poller.PollWorker.executeWork(PollWorker.java:76) at private.com.amazonaws.services.datapipeline.poller.PollWorker.run(PollWorker.java:53) at java.lang.Thread.run(Thread.java:748) Caused by: amazonaws.datapipeline.taskrunner.TaskExecutionException: EMR job '@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' with jobFlowId 'j-2E7PU1OK3GIJI' is failed with status 'FAILED' and reason 'Cluster ready after last step completed.'. Step 'df-0693981356F3KEDFQ6GG_@DefaultEmrActivity1_2017-11-20T12:13:08_Attempt=1' is in status 'FAILED' with reason 'null' at amazonaws.datapipeline.cluster.EmrUtil.runSteps(EmrUtil.java:286) at amazonaws.datapipeline.activity.EmrActivity.runActivity(EmrActivity.java:63) ... 7 more
集群正在正确启动。
这是管道的屏幕截图:-
我认为 activity 中的 'step' 存在一些问题。任何输入都会有所帮助。
问题是:- 1) 脚本应该以逗号分隔。像这样的东西:-
command-runner.jar,spark-submit,--deploy-mode,cluster,--class,com.amazon.Main
Link:- http://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emrcluster.html
2) EmrActivity 不支持暂存。所以,我们不能在step指令中使用${INPUT1_STAGING_DIR}
。目前,我已将其替换为硬编码的 S3 URL。