如果自上一个检查点创建以来没有变化,如何调用 DeltaLog checkpoint()?
How to call DeltaLog checkpoint() if no change since previous checkpoint was created?
谁能帮我解释一下 deltalog 检查点是如何工作的?我遇到了一个问题,我尝试以固定的时间间隔创建常规检查点,但是如果自上次创建检查点以来增量 table 没有任何变化,我最终会出现以下错误:
An error was encountered:
java.lang.IllegalStateException: State of the checkpoint doesn't match that of the snapshot.
at org.apache.spark.sql.delta.Checkpoints$.writeCheckpoint(Checkpoints.scala:328)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles(Checkpoints.scala:145)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles$(Checkpoints.scala:144)
at org.apache.spark.sql.delta.DeltaLog.writeCheckpointFiles(DeltaLog.scala:59)
at org.apache.spark.sql.delta.Checkpoints.$anonfun$checkpoint(Checkpoints.scala:137)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.DeltaLog.recordOperation(DeltaLog.scala:59)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:106)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:91)
at org.apache.spark.sql.delta.DeltaLog.recordDeltaOperation(DeltaLog.scala:59)
at org.apache.spark.sql.delta.Checkpoints.checkpoint(Checkpoints.scala:133)
at org.apache.spark.sql.delta.Checkpoints.checkpoint$(Checkpoints.scala:132)
at org.apache.spark.sql.delta.DeltaLog.checkpoint(DeltaLog.scala:59)
... 55 elided
重申 Delta 用户#deltalake-oss Slack 频道中的陈述:
- 自动创建检查点(默认情况下每 10 次提交)(查看更多 here)
- 这让我想知道您为什么要以固定的时间间隔手动创建检查点,而检查点已经为您完成了
- 即使在日志清理期间删除旧检查点,新检查点仍然包含 table 在那个时间点的整个状态。因此,丢失旧检查点不会使新数据失效。
谁能帮我解释一下 deltalog 检查点是如何工作的?我遇到了一个问题,我尝试以固定的时间间隔创建常规检查点,但是如果自上次创建检查点以来增量 table 没有任何变化,我最终会出现以下错误:
An error was encountered:
java.lang.IllegalStateException: State of the checkpoint doesn't match that of the snapshot.
at org.apache.spark.sql.delta.Checkpoints$.writeCheckpoint(Checkpoints.scala:328)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles(Checkpoints.scala:145)
at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles$(Checkpoints.scala:144)
at org.apache.spark.sql.delta.DeltaLog.writeCheckpointFiles(DeltaLog.scala:59)
at org.apache.spark.sql.delta.Checkpoints.$anonfun$checkpoint(Checkpoints.scala:137)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
at org.apache.spark.sql.delta.DeltaLog.recordOperation(DeltaLog.scala:59)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:106)
at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:91)
at org.apache.spark.sql.delta.DeltaLog.recordDeltaOperation(DeltaLog.scala:59)
at org.apache.spark.sql.delta.Checkpoints.checkpoint(Checkpoints.scala:133)
at org.apache.spark.sql.delta.Checkpoints.checkpoint$(Checkpoints.scala:132)
at org.apache.spark.sql.delta.DeltaLog.checkpoint(DeltaLog.scala:59)
... 55 elided
重申 Delta 用户#deltalake-oss Slack 频道中的陈述:
- 自动创建检查点(默认情况下每 10 次提交)(查看更多 here)
- 这让我想知道您为什么要以固定的时间间隔手动创建检查点,而检查点已经为您完成了
- 即使在日志清理期间删除旧检查点,新检查点仍然包含 table 在那个时间点的整个状态。因此,丢失旧检查点不会使新数据失效。