缺少 AWS Glue table - Pyspark 错误 Py4JJavaError（保存时出错 table）

Question

我在使用特定胶水 table 时出现异常行为（这是我以前从未见过的），在这种情况下是由 spark 作业创建的 table（安排气流） .

基本上，该作业包括从数据仓库中提取单个 table 并写入 s3/glue 中的 table，覆盖现有分区（保存模式为覆盖）。由于某种原因，这项工作今天失败了，这是引发的异常。

py4j.protocol.Py4JJavaError: An error occurred while calling o108.saveAsTable.
java.lang.AssertionError: assertion failed: Expect the table customer_cdr has been dropped when the save mode is Overwrite
at scala.Predef$.assert(Predef.scala:170)
at org.apache.spark.sql.execution.command.CreateDataSourceTableAsSelectCommand.run(createDataSourceTables.scala:155)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
at org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)

起初，我和我的同事认为这只是spark的EMR集群错误，然后重置集群就可以解决它。但后来我们看到了更奇怪的东西。

事件发生后table从目录中消失了（它在胶水控制台中不可见，在雅典娜中也不可见）。 但这就是陷阱！ table 仍然存在，但被隐藏了。我们无法通过搜索工具中的 glue IDE 看到它，但我们可以通过替换 url 中的 table 名称从控制台访问它，从 Athena 查询数据，或者我们甚至可以列出table 从 cli 使用 get-table 命令。

我们尝试删除 table（控制台或 cli），但我们遇到了以下问题：

An error occurred (EntityNotFoundException) when calling the DeleteTable operation: Table (v_ntfm_merchantlogstatus) not found

如果table从湖泊形成中移除就差不多了。现在，问题是：你们有没有遇到过这样的问题，它的调试过程是什么？谢谢！

Answer 1

如果每个人都发现一个奇怪的问题，比如 AWS 报告胶水有异常行为导致 table“消失”（消失是因为 table 仍然存在，它只是对具有我们角色的用户组不可见，仅对给定帐户的管理员或根用户可见。

那么，在这种情况下应该怎么做？在这种特殊情况下联系 AWS 解决事件（唯一可能的途径）。

干杯，

缺少 AWS Glue table - Pyspark 错误 Py4JJavaError（保存时出错 table）

AWS Glue table missing - Pyspark error Py4JJavaError (error while saving table)

amazon-web-services

pyspark

amazon-athena

aws-glue

aws-lake-formation