如果自上一个检查点创建以来没有变化,如何调用 DeltaLog checkpoint()?

How to call DeltaLog checkpoint() if no change since previous checkpoint was created?

谁能帮我解释一下 deltalog 检查点是如何工作的?我遇到了一个问题,我尝试以固定的时间间隔创建常规检查点,但是如果自上次创建检查点以来增量 table 没有任何变化,我最终会出现以下错误:

An error was encountered:
java.lang.IllegalStateException: State of the checkpoint doesn't match that of the snapshot.
  at org.apache.spark.sql.delta.Checkpoints$.writeCheckpoint(Checkpoints.scala:328)
  at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles(Checkpoints.scala:145)
  at org.apache.spark.sql.delta.Checkpoints.writeCheckpointFiles$(Checkpoints.scala:144)
  at org.apache.spark.sql.delta.DeltaLog.writeCheckpointFiles(DeltaLog.scala:59)
  at org.apache.spark.sql.delta.Checkpoints.$anonfun$checkpoint(Checkpoints.scala:137)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
  at com.databricks.spark.util.DatabricksLogging.recordOperation(DatabricksLogging.scala:77)
  at com.databricks.spark.util.DatabricksLogging.recordOperation$(DatabricksLogging.scala:67)
  at org.apache.spark.sql.delta.DeltaLog.recordOperation(DeltaLog.scala:59)
  at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation(DeltaLogging.scala:106)
  at org.apache.spark.sql.delta.metering.DeltaLogging.recordDeltaOperation$(DeltaLogging.scala:91)
  at org.apache.spark.sql.delta.DeltaLog.recordDeltaOperation(DeltaLog.scala:59)
  at org.apache.spark.sql.delta.Checkpoints.checkpoint(Checkpoints.scala:133)
  at org.apache.spark.sql.delta.Checkpoints.checkpoint$(Checkpoints.scala:132)
  at org.apache.spark.sql.delta.DeltaLog.checkpoint(DeltaLog.scala:59)
  ... 55 elided

重申 Delta 用户#deltalake-oss Slack 频道中的陈述:

  • 自动创建检查点(默认情况下每 10 次提交)(查看更多 here
  • 这让我想知道您为什么要以固定的时间间隔手动创建检查点,而检查点已经为您完成了
  • 即使在日志清理期间删除旧检查点,新检查点仍然包含 table 在那个时间点的整个状态。因此,丢失旧检查点不会使新数据失效。