使用 --rsyncable 选项制作 zstd 压缩文件 'rsyncable' 就像 gzip 一样

Make zstd compressed files 'rsyncable' like gzip does with --rsyncable option

有没有办法制作 zstd 压缩文件 'rsyncable',就像 gzip 使用 --rsyncable 选项一样?

我试过将输入文件拆分成固定长度的块并分别压缩它们,但没有成功。

关于 --rsyncable 选项:

When you synchronize a compressed file between two computers, this option allows rsync to transfer only files that were changed in the archive instead of the entire archive. Normally, after a change is made to any file in the archive, the compression algorithm can generate a new version of the archive that does not match the previous version of the archive. In this case, rsync transfers the entire new version of the archive to the remote computer. With this option, rsync can transfer only the changed files as well as a small amount of metadata that is required to update the archive structure in the area that was changed.

I've tried splitting input files into fixed length chunks and compressing them separately with no luck.

如果您只更改字节而不移动它们,这应该可以工作 NP。

也就是说,如果你将 "The hog crawled under the high fence" 拆分成固定大小的块 ["The hog ", "crawled ", "under th", "e high f", "ence"] 然后独立压缩它们,那么将“hog”更改为“dog”将是 rsync 友好的,因为压缩的剩余块的版本 ["crawled ", "under th", "e high f", "ence"] 仍将相同。

另一方面,如果您移动字节,例如将“hog”替换为“caterpillar”,那么拆分将不再有帮助,因为块 ["The cat", "erpillar", " crawled", " under t", "he high ", "fence"] 现在不同并且所以它们的压缩版本也不同。

Rsync 对前者有帮助,但对后者没有帮助。

如果你想要任意修改,你需要一个智能块分割算法,该算法倾向于文件的某些点。例如,如果将 space 上的 "The hog crawled under the high fence" 拆分为 "The ", "hog ", "crawled ", "under ", "the ", "high ", "fence",则将“hog”替换为“caterpillar”只会更改一个压缩块,从而使 rsync 无法传输其余部分.

P.S。看起来像 LBFS uses such a chunk splitting scheme: " 通过在文件上滑动 48 字节 window 并计算每个 window 的 Rabin 指纹。当指纹的低 13 位为零时LBFS 将这 48 个字节称为断点并结束当前块并开始一个新块"

随着版本 1.3.8 zstd 的引入 --rsyncable 模式。