DEFLATE方法推理

DEFLATE method reasoning

为什么 LZ77 DEFLATE 在第二遍中使用霍夫曼编码而不是 LZW?他们的组合有什么是最佳的吗?如果是这样,LZ77 的输出的性质是什么使其比 LZW 或其他一些方法更适合霍夫曼压缩?

Mark Adler could best answer this question.

The details of how the LZ77 and Huffman work together need some closer examination. Once the raw data has been turned into a string of characters and special length, distance pairs, these elements must be represented with Huffman codes.

Though this is NOT, repeat, NOT standard terminology, call the point where we start reading in bits a "dial tone." After all, in our analogy, the dial tone is where you can start specifying a series of numbers that will end up mapping to a specific phone. So call the very beginning a "dial tone." At that dial tone, one of three things could follow: a character, a length-distance pair, or the end of the block. Since we must be able to tell which it is, all the possible characters ("literals"), elements that indicate ranges of possible lengths ("lengths"), and a special end-of-block indicator are all merged into a single alphabet. That alphabet then becomes the basis of a Huffman tree. Distances don't need to be included in this alphabet, since they can only appear directly after lengths. Once the literal has been decoded, or the length-distance pair decoded, we are at another "dial-tone" point and we start reading again. If we got the end-of-block symbol, of course, we're either at the beginning of another block or at the end of the compressed data.

Length codes or distance codes may actually be a code that represents a base value, followed by extra bits that form an integer to be added to the base value.

...

Read the whole deal here.

长话短说。 LZ77 提供重复消除。霍夫曼编码提供比特减少。 It's also on the wiki.

LZW 试图利用重复的字符串,就像您对 LZ77 的第一个 "stage" 的称呼一样。然后它在对该信息进行熵编码方面做得很差。 LZW 已完全被更现代的方法所取代。 (除了它在 GIF 格式中的遗留用途。)一旦 LZ77 生成了一个文字和匹配列表,LZW 就没有什么可以利用的了,然后它将为该信息制作一个几乎完全无效的熵编码器。