如何假设 zlib 解压缩后的大小是多少？

Question

我正在制作一个简单的 C++ 应用程序，它必须将压缩数据发送到我的 API。 API 在同样被压缩的应用程序中触发响应。我必须解压缩它。我正在使用 zlib 的解压缩功能，但我不知道数据有多大。有人可以帮我解决这个问题吗？如何计算和设置目标缓冲区的大小？

Answer 1

我认为文档对此非常清楚

ZEXTERN int ZEXPORT uncompress OF((Bytef *dest, uLongf *destLen,
                               const Bytef *source, uLong sourceLen));
Decompresses the source buffer into the destination buffer. sourceLen is the byte length of the source buffer. Upon entry, destLen is the total size of the destination buffer, which must be large enough to hold the entire uncompressed data. (The size of the uncompressed data must have been saved previously by the compressor and transmitted to the decompressor by some mechanism outside the scope of this compression library.) Upon exit, destLen is the actual size of the uncompressed data.

uncompress returns Z_OK if success, Z_MEM_ERROR if there was not enough memory, Z_BUF_ERROR if there was not enough room in the output buffer, or Z_DATA_ERROR if the input data was corrupted or incomplete. In the case where there is not enough room, uncompress() will fill the output buffer with the uncompressed data up to that point.

因此 zlib 建议将未压缩的大小与压缩流一起发送。

但是我们也可以注意这句话

In the case where there is not enough room, uncompress() will fill the output buffer with the uncompressed data up to that point.

因此您可以在压缩消息的开头包含长度。然后在您的目的地开始用一个小缓冲区解压缩。它可能不会将所有内容解压缩到小缓冲区中。但是如果你一开始就写，它解压缩足够你读取数据长度。然后你可以将它用于allocate/resize你的目标缓冲区并再次使用解压缩。

根据您的用例，这可能是个好主意，也可能不是。如果您的消息大小变化不大并且程序较长运行最好只维护一个目标缓冲区并根据需要增加该缓冲区。

Answer 2

作为速度优化，如果您愿意偶尔对解压缩进行冗余调用，您可以预测下一次调用的目标缓冲区的大小。通常情况下，给定流中的数据段大约按相同的因子压缩。例如，文本通常会压缩 2 到 3 倍。因此，在某处记录你的目标缓冲区的最后大小。然后，为下一个解压缩调用分配相同的数量。如果太少 (Z_BUF_ERROR)，则增加缓冲区大小并重复。如果buffer太多space，没有问题；只是减少下一次调用的大小。

这是一个额外的优化。假设您的目标将非常大，比如说千兆字节。而且您不想浪费 cpu 个循环进行试验性减压。您可以只提供源数据的前几百 KB，然后查看它扩展了多少。然后相应地分配实际的目标缓冲区。我不知道 uncompress() 是否会让你这样做，但 inflate() 会。

如何假设 zlib 解压缩后的大小是多少？

How to assume what will be the size after zlib uncompression?

c++

compression

zlib