AWS Step Functions EMR PySpark 任务步骤失败
AWS Step Functions EMR PySpark Task Step failed
我有一个有效的 EMR 步骤,大约需要 500 秒。工作的 EMR 控制器日志显示:
2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-19T18:46:35.934Z INFO Step succeeded with exitCode 0 and took 576 seconds
当我尝试使用 Step Functions 运行 相同的步骤时,spark-submit 代码看起来相同但存在错误:
2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-22T09:56:07.309Z WARN Step failed with exitCode 1 and took 2 seconds
STDERR 显示:
线程“main”中的异常java.lang.RuntimeException:java.io.IOException:无法运行程序“spark-submit”(在目录“.”中):错误=2,没有这样的文件或目录
运行 使用步进函数的 pyspark 脚本的正确方法是什么?
步骤定义:
"EMR AddStep - Aggregate Daily": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.Cluster.Id",
"Step": {
"Name": "Aggregate Daily",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"s3://emr/scripts/aggregate.py",
"--day",
"20210915",
"--base_uri",
"s3://"
]
}
}
}
问题是 EMR 集群是在没有 Spark 选项的情况下创建的。
解决方案是在创建集群任务的步骤函数 属性 中的“应用程序”下添加 "Name": "Spark"
:
"EMR CreateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "ExampleCluster",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.33.0",
"Applications": [
{
"Name": "Hive",
"Name": "Spark"
}
],
我有一个有效的 EMR 步骤,大约需要 500 秒。工作的 EMR 控制器日志显示:
2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-19T18:46:35.934Z INFO Step succeeded with exitCode 0 and took 576 seconds
当我尝试使用 Step Functions 运行 相同的步骤时,spark-submit 代码看起来相同但存在错误:
2021-09-19T18:36:59.786Z INFO StepRunner: Created Runner for step 7
INFO startExec 'hadoop jar /var/lib/aws/emr/step-runner/hadoop-jars/command-runner.jar spark-submit --deploy-mode cluster s3://emr/scripts/aggregate.py --day 20210915 --base_uri s3://'
...
2021-09-22T09:56:07.309Z WARN Step failed with exitCode 1 and took 2 seconds
STDERR 显示: 线程“main”中的异常java.lang.RuntimeException:java.io.IOException:无法运行程序“spark-submit”(在目录“.”中):错误=2,没有这样的文件或目录
运行 使用步进函数的 pyspark 脚本的正确方法是什么?
步骤定义:
"EMR AddStep - Aggregate Daily": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:addStep.sync",
"Parameters": {
"ClusterId.$": "$.cluster.Cluster.Id",
"Step": {
"Name": "Aggregate Daily",
"ActionOnFailure": "CONTINUE",
"HadoopJarStep": {
"Jar": "command-runner.jar",
"Args": [
"spark-submit",
"--deploy-mode",
"cluster",
"s3://emr/scripts/aggregate.py",
"--day",
"20210915",
"--base_uri",
"s3://"
]
}
}
}
问题是 EMR 集群是在没有 Spark 选项的情况下创建的。
解决方案是在创建集群任务的步骤函数 属性 中的“应用程序”下添加 "Name": "Spark"
:
"EMR CreateCluster": {
"Type": "Task",
"Resource": "arn:aws:states:::elasticmapreduce:createCluster.sync",
"Parameters": {
"Name": "ExampleCluster",
"VisibleToAllUsers": true,
"ReleaseLabel": "emr-5.33.0",
"Applications": [
{
"Name": "Hive",
"Name": "Spark"
}
],