如何从 shell 脚本捕获 Spark 错误
How to catch Spark error from shell script
我在 AWS Data Pipeline 中有一个管道运行名为 shell.sh:
的 shell 脚本
$ spark-submit transform_json.py
Running command on cluster...
[54.144.10.162] Running command...
[52.206.87.30] Running command...
[54.144.10.162] Command complete.
[52.206.87.30] Command complete.
run_command finished in 0:00:06.
AWS Data Pipeline 控制台显示作业是 "FINISHED",但在 stderr 日志中我看到作业实际上已中止:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null, AWS Error Message: Not Found...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
...
20/05/22 11:42:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/05/22 11:42:47 INFO MemoryStore: MemoryStore cleared
20/05/22 11:42:47 INFO BlockManager: BlockManager stopped
20/05/22 11:42:47 INFO BlockManagerMaster: BlockManagerMaster stopped
20/05/22 11:42:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/05/22 11:42:47 INFO SparkContext: Successfully stopped SparkContext
20/05/22 11:42:47 INFO ShutdownHookManager: Shutdown hook called
我对数据管道和 Spark 有点陌生;无法理解幕后实际发生的事情。如何让 shell 脚本捕获 SparkException
?
尝试像下面的例子...
您的 shell 脚本可以捕获这样的错误代码...其中非零退出代码是错误
$?
是最近执行的命令的退出状态;按照惯例,0 表示成功,其他任何表示失败。
spark-submit transform_json.py
ret_code=$?
if [ $ret_code -ne 0 ]; then
exit $ret_code
fi
您必须在错误情况下通过 sys.exit(-1)
编码为 return 退出代码。检查此 python 异常处理...
勾选这个Exit codes in Python
我在 AWS Data Pipeline 中有一个管道运行名为 shell.sh:
的 shell 脚本$ spark-submit transform_json.py
Running command on cluster...
[54.144.10.162] Running command...
[52.206.87.30] Running command...
[54.144.10.162] Command complete.
[52.206.87.30] Command complete.
run_command finished in 0:00:06.
AWS Data Pipeline 控制台显示作业是 "FINISHED",但在 stderr 日志中我看到作业实际上已中止:
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Status Code: 404, AWS Service: Amazon S3, AWS Request ID: xxxxx, AWS Error Code: null, AWS Error Message: Not Found...
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 5.0 failed 1 times, most recent failure: Lost task 0.0 in stage 5.0 (TID 5, localhost, executor driver): org.apache.spark.SparkException: Task failed while writing rows.
...
20/05/22 11:42:47 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
20/05/22 11:42:47 INFO MemoryStore: MemoryStore cleared
20/05/22 11:42:47 INFO BlockManager: BlockManager stopped
20/05/22 11:42:47 INFO BlockManagerMaster: BlockManagerMaster stopped
20/05/22 11:42:47 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
20/05/22 11:42:47 INFO SparkContext: Successfully stopped SparkContext
20/05/22 11:42:47 INFO ShutdownHookManager: Shutdown hook called
我对数据管道和 Spark 有点陌生;无法理解幕后实际发生的事情。如何让 shell 脚本捕获 SparkException
?
尝试像下面的例子...
您的 shell 脚本可以捕获这样的错误代码...其中非零退出代码是错误
$?
是最近执行的命令的退出状态;按照惯例,0 表示成功,其他任何表示失败。
spark-submit transform_json.py
ret_code=$?
if [ $ret_code -ne 0 ]; then
exit $ret_code
fi
您必须在错误情况下通过 sys.exit(-1)
编码为 return 退出代码。检查此 python 异常处理...
勾选这个Exit codes in Python