Spark 启动器:无法看到失败的 SQL 查询的完整堆栈跟踪
Spark Launcher: Can't see the complete stack trace for failed SQL query
我正在使用 SparkLauncher
在 Yarn 之上以集群模式连接到 Spark。我正在 运行 使用 Scala 编写一些 SQL 代码,如下所示:
def execute(code: String): Unit = {
try {
val resultDataframe = spark.sql(code)
resultDataframe.write.json("s3://some/prefix")
catch {
case NonFatal(f) =>
log.warn(s"Fail to execute query $code", f)
log.info(f.getMessage, getNestedStackTrace(f, Seq[String]()))
}
}
def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
if (e.getCause == null) return msg
getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}
现在,当我 运行 使用 execute()
方法应该失败的查询时,例如,查询没有分区谓词的分区 table - select * from partitioned_table_on_dt limit 1;
,我返回不正确的堆栈跟踪。
当我从 spark-shell:[=38= 手动 运行 spark.sql(code).write.json() 时更正堆栈跟踪]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) LocalLimit 1
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:127)
...
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate found for partitioned table
partitioned_table_on_dt.
If the table is cached in memory then turn off this check by setting
hive.mapred.mode to nonstrict
at org.apache.spark.sql.hive.execution.HiveTableScanExec.prunePartitions(HiveTableScanExec.scala:155)
...
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
...
上面 execute() 方法的错误堆栈跟踪:
Job Aborted:
"org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)",
"org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)",
...
"org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)",
"org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:131)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:127)",
...
spark-shell 堆栈跟踪有三个嵌套异常 SparkException
(SemanticException
(TreeNodeException
)) 但我在代码中看到的回溯是仅来自 SparkException
和 TreeNodeException
,但即使在 getNestedStackTrace()
方法中获取嵌套堆栈跟踪后,最有价值的 SemanticException
回溯也丢失了。
任何 Spark/Scala 专家都可以告诉我我做错了什么,或者我如何在此处获取完整的堆栈跟踪,但没有所有异常?
递归方法 getNestedStackTrace()
有一个错误。
def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
if (e == null) return msg // this should be e not e.getCause
getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}
我正在使用 SparkLauncher
在 Yarn 之上以集群模式连接到 Spark。我正在 运行 使用 Scala 编写一些 SQL 代码,如下所示:
def execute(code: String): Unit = {
try {
val resultDataframe = spark.sql(code)
resultDataframe.write.json("s3://some/prefix")
catch {
case NonFatal(f) =>
log.warn(s"Fail to execute query $code", f)
log.info(f.getMessage, getNestedStackTrace(f, Seq[String]()))
}
}
def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
if (e.getCause == null) return msg
getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}
现在,当我 运行 使用 execute()
方法应该失败的查询时,例如,查询没有分区谓词的分区 table - select * from partitioned_table_on_dt limit 1;
,我返回不正确的堆栈跟踪。
当我从 spark-shell:[=38= 手动 运行 spark.sql(code).write.json() 时更正堆栈跟踪]
org.apache.spark.sql.catalyst.errors.package$TreeNodeException: execute, tree:
Exchange SinglePartition
+- *(1) LocalLimit 1
at org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)
at org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:131)
org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:127)
...
Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: No partition predicate found for partitioned table
partitioned_table_on_dt.
If the table is cached in memory then turn off this check by setting
hive.mapred.mode to nonstrict
at org.apache.spark.sql.hive.execution.HiveTableScanExec.prunePartitions(HiveTableScanExec.scala:155)
...
org.apache.spark.SparkException: Job aborted.
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
at org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
...
上面 execute() 方法的错误堆栈跟踪:
Job Aborted:
"org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)",
"org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)",
"org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)",
...
"org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56)",
"org.apache.spark.sql.execution.exchange.ShuffleExchangeExec.doExecute(ShuffleExchangeExec.scala:119)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:131)",
"org.apache.spark.sql.execution.SparkPlan$$anonfun$execute.apply(SparkPlan.scala:127)",
...
spark-shell 堆栈跟踪有三个嵌套异常 SparkException
(SemanticException
(TreeNodeException
)) 但我在代码中看到的回溯是仅来自 SparkException
和 TreeNodeException
,但即使在 getNestedStackTrace()
方法中获取嵌套堆栈跟踪后,最有价值的 SemanticException
回溯也丢失了。
任何 Spark/Scala 专家都可以告诉我我做错了什么,或者我如何在此处获取完整的堆栈跟踪,但没有所有异常?
递归方法 getNestedStackTrace()
有一个错误。
def getNestedStackTrace(e: Throwable, msg: Seq[String]): Seq[String] = {
if (e == null) return msg // this should be e not e.getCause
getNestedStackTrace(e.getCause, msg ++ e.getStackTrace.map(_.toString))
}