如何将本地文件夹与存储桶中的文件夹进行比较?

How to compare local folder to folder in bucket?

我们正在将我们的项目归档到一个存储桶中(使用 gsutil rsync)。我的任务是验证每次上传后,必须对本地项目文件夹和上传到存储桶的文件夹进行比较。这是为了确保本地数据实际上已完全上传到存储桶中。

我怎样才能可靠地执行这样的测试?

gsutil rsync 命令本身会对每个上传的文件执行校验和验证。来自 Checksum Validation And Failure Handling:

At the end of every upload or download, the gsutil rsync command validates that the checksum of the source file/object matches the checksum of the destination file/object. If the checksums do not match, gsutil will delete the invalid copy and print a warning message.

[snip]

The rsync command will retry when failures occur, but if enough failures happen during a particular copy or delete operation the command will fail.

If the -C option is provided, the command will instead skip the failing object and move on. At the end of the synchronization run if any failures were not successfully retried, the rsync command will report the count of failures, and exit with non-zero status. At this point you can run the rsync command again, and it will attempt any remaining needed copy and/or delete operations.

[snip]

For more details about gsutil's retry handling, please see gsutil help retries.

所以你可以:

  • 只需检查命令的 stdout/stderr 是否有任何警告表明此类校验和失败是致命的
  • 使用带有 -C 选项的 gsutil rsync cmd 并直接使用它们的故障跟踪结果(最终会进行一些自动重试)
  • 执行替代(偏执)检查,只查找所有应该同步的文件在目标位置是否存在(如果文件仍然存在,其内容必须通过校验和检查)