Tensorflow 检查点保存数据文件

Tensorflow checkpoints saving data file

大家好，

我正在使用 tensorflow 来解决一些机器学习问题，但考虑到检查点，我遇到了一个难以理解的问题。保存检查点会产生元数据、索引和数据文件。但是数据文件末尾的数字是什么意思，例如 model.ckpt.data-00000-of-00001？为什么它总是 00000-of-00001？

A tf.training.Saver 实例化时有一个参数 sharded，默认设置为 false。

sharded: If True, shard the checkpoints, one per device.

当您调用 save() 时，您可以根据文档看到：

Returns: A string: path prefix used for the checkpoint files. If the saver is sharded, this string ends with: '-?????-of-nnnnn' where 'nnnnn' is the number of shards created. If the saver is empty, returns None.

因此，如果您设置 sharded=True 并在多个设备上进行训练，例如使用 GPU 集群，或者让我们以本地机器为例，其中您的一部分模型位于 CPU 和 GPU 中的另一部分，你会得到：data-00000-00002 和 data-00001-of-00002.