"tar cf xxxx | gzip -n" 与 "tar zcf xxxx" 之间有什么区别？

Question

我正在维护一些旧脚本，我遇到了这个：

tar -cvf - ${files} | gzip -n -c | openssl ...

这个和没有 gzip 的“-n”的更紧凑的版本之间有什么实际区别吗？有没有其他方法可以在 tar 命令中将“-n”传递给 gzip？

tar -cvzf - ${files} | openssl ...

这是 Linux 3.0.101-0.47.71-default。我希望性能略有提高，但我担心的是不会导致下游发生变化。

Answer 1

旧 tar 没有内置 gzip 压缩。在我知道的任何 gzip 上，通过 stdin 馈送时 n 没有任何意义。除此之外，当然要将 gzip 数据中的压缩时间戳设置为 0。

老实说，我怀疑那个选项的用处——在我可以测试的任何东西上，大小都保持不变。这是正确的行为——header（在 RFC 1952, Sec. 2.2 中指定）仅具有自纪元以来以秒为单位的 4B timespec——如果将其设置为 0，则意味着“未保存 timespec”。因此，除非您需要 而不是 让 gzip 数据的接收者知道压缩数据的时间，否则 -n 没有任何好处。（如果您有一些基于未知时间的身份验证方案，例如 Session 从设备启动后生成的 ID，那么省略此类时间戳可能会带来安全好处，但坦率地说，我宁愿担心堵塞这些安全漏洞，也不愿将时间戳设置为零。）

Answer 2

事实证明，两者之间可能存在显着差异。至少对于我正在使用的 tar（macOS 上的默认 tar）。 tar 在写入 stdout 时将所有输出写入块，用零填充以完成最后一个块，即使在压缩时，在压缩数据后填充。这在手册页中有记录，因此它不是错误。默认块大小为 10K。 tar -cvzf 的结果几乎总是比 tar -cvf ... | gzip 大 5K，使用默认的块因子。

来自手册页：

 All archive output is written in correctly-sized blocks, even if the out-
 put is being compressed.  Whether or not the last output block is padded
 to a full block size varies depending on the format and the output
 device.  For tar and cpio formats, the last block of output is padded to
 a full block size if the output is being written to standard output or to
 a character or block device such as a tape drive.  If the output is being
 written to a regular file, the last block will not be padded.  Many com-
 pressors, including gzip(1) and bzip2(1), complain about the null padding
 when decompressing an archive created by tar, although they still extract
 it correctly.

我正在使用 bsdtar 2.8.3.

GNU tar 1.29 没有这样做，尽管文档似乎表明它应该：

If the output goes directly to a local disk, and not through stdout,
then the last write is not extended to a full record size. Otherwise,
reblocking occurs.

文档继续指出 gzip 抱怨尾随零的相同问题。然而，通过 stdout 时没有尾随零。

"tar cf xxxx | gzip -n" 与 "tar zcf xxxx" 之间有什么区别？

any difference between "tar cf xxxx | gzip -n" versus "tar zcf xxxx"?

linux

gzip

tar