霍夫曼树是如何传播的?

How are Huffman trees transmitted?

我正在尝试了解 DEFLATE 算法的工作原理。我找到了这个 document published by UC Davis。哈夫曼树是怎么传播的那部分没看懂

Probably the trickiest part of the DEFLATE specification to understand is the way trees are encoded to go along with the data, when that data is compressed with specialized trees.

The trees are transmitted by their codelengths, as previously discussed. The codelengths are put all together into a sequence of numbers between 0 and 15 (the Huffman trees that are created must be kept to codelengths of no more than 15; this is the tricky part, not the part about constraining the order of the elements).

Not all the elements have to be given codelengths; if the last elements of an alphabet are of 0 codelengths, they can and probably should be left out. The number of elements in each of the two alphabets will be transmitted, so the trimmed alphabets go together into a single sequence.

首先,codelength到底是什么意思,为什么可以为0?

我也不太了解运行-长度压缩,他们在最后一段之后提到了它。

Once this sequence of codelengths is assembled, it is compressed with a form of what is called run-length compression. When several elements in a row have the same codelength (often 0) special symbols may be used to indicate the number of elements with this codelength. Our sequence is now a sequence of numbers between 0 and 18 (possibly with extra bits forming integers to modify base values, as was the case with the length and distance codes).

A Huffman tree is created for this alphabet of 0-18. Sigh. The sequence of 0-18 codes and extra bits is prepared with the Huffman codes replacing the 0-18 elements.

代码长度是该符号的代码长度(以位为单位)。

零代码长度表示该符号未出现在压缩数据中,因此该符号没有代码。

运行-length encoding 意味着,在这种情况下,重复代码长度的序列,例如“7, 7, 7, 7, 7, 7”,替换为“7,重复最后一个长度5次”。