我是否正确打包字节以进行 DEFLATE 压缩?

Am I packing bytes correctly for DEFLATE compression?

所以,我知道有很多库可用于进行 DEFLATE 压缩。如果我正在开发生产产品,我会使用 zlib 之类的东西。但作为一种爱好,我正在自己实施它以尝试弄清楚。因此,经过几周的编码、重新编码和调整,我终于能够在合理的时间范围内产生一些我认为合适的输出。但是,如果我尝试 post 我的输出到其中一个在线工具中,我会收到不一定能帮助我找出输出问题的错误。当我让我的程序生成一个实际的位串并手动解析它时,一切似乎都符合 DEFLATE 标准,我可以重建我的数据。这让我相信我的编码是正确的,但我可能完全误解了打包字节时不同的位顺序。以下是我的输出的 Base64 编码版本,然后是我的程序生成的 8 位字节列表。如果有人可以帮助我指出数据失败的地方,将不胜感激。

Defective program output (both Base64 and raw bytes):

Base64 Encoded Output:
ZYQhAQAADMKqQBWagELQXz/AzTQX+eAB

Byte List:
01100101
10000100
00100001
00000001
00000000
00000000
00001100
11000010
10101010
01000000
00010101
10011010
10000000
01000010
11010000
01011111
00111111
11000000
11001101
00110100
00010111
11111001
11100000
00000001

作为我对文档的理解的概述。该标准表示一个块以 1 位开始,以声明它是否是最后一个块,然后是 2 位以声明使用哪种压缩类型,然后是 5 位 hlit、5 位 hdist、4 位 hclen,然后是 hclen + 4 组 3 位,每组给出代码长度对于用于输出 literal/length 代码的代码长度以及距离代码的霍夫曼代码。在此之后是 hlit+257+hdist+1 代码长度的霍夫曼编码字符串,最后是实际压缩数据的霍夫曼编码字符串,由块代码结尾封顶。有趣的是,霍夫曼代码本身是按相反顺序打包的……然而我感到困惑的地方是一些长度代码(代码 16、17、18)之后的 "extra bits" 以及更高的长度和距离代码。那些以与霍夫曼代码相同的相反顺序打包还是被视为 "data other than huffman code"?

Looking at first byte in list (byte 0):

*01 = last block bit
*02 = 2bit compression type (10 = dynamic huffman)
*03 = msb of hlit (#of literal/length codes - 257)

 *03                 *02     *01
  v                   v       v
+-------------------------------+
| 0   1   1   0   0   1   0   1 |
+-------------------------------+
| Byte 0                        |



Looking at bytes 8 and 9 (starting with byte 0):

*01 = last bit of hclen + 4 sets of codelen code lengths
*02 = msb of huffman code "10" ("10" = codelen code 18 - repeat 0 11-138 times)
*03 = lsb of 7 "extra bits" for codelen code 18
*04 = msb of 7 "extra bits" for codelen code 18

                  *03     *02 *01                           *04
                   v       v   v                             v
+---------------------------------+---------------------------------+
|  1   0   1   0   1   0   1   0  |  0   1   0   0   0   0   0   0  |
+---------------------------------+---------------------------------+
| Byte 8                          | Byte 9                          |

这是我的程序的一些额外输出,其中包含实际使用的哈夫曼代码:

--------------------------------------------------------------------------
Literal/Length Bit Codes:  Block: 0    hlit: (269 - 257) = 12
--------------------------------------------------------------------------
Code: 32        Count: 1        BitCode: 000                Bit Length: 3                   
Code: 33        Count: 1        BitCode: 001                Bit Length: 3                   
Code: 66        Count: 1        BitCode: 010                Bit Length: 3                   
Code: 97        Count: 1        BitCode: 011                Bit Length: 3                   
Code: 98        Count: 1        BitCode: 100                Bit Length: 3                   
Code: 104       Count: 1        BitCode: 101                Bit Length: 3                   
Code: 108       Count: 1        BitCode: 110                Bit Length: 3                   
Code: 256       Count: 1        BitCode: 1110               Bit Length: 4                   
Code: 268       Count: 1        BitCode: 1111               Bit Length: 4                   

--------------------------------------------------------------------------
Distance Bit Codes:  Block: 0    hdist: (5 - 1) = 4
--------------------------------------------------------------------------
Code: 4         Count: 1        BitCode: 00                 Bit Length: 2                   

--------------------------------------------------------------------------
CodeLength Bit Codes:  Block: 0    hclen: (16 - 4) = 12
--------------------------------------------------------------------------
Code: 2         Count: 1        BitCode: 110                Bit Length: 3                   
Code: 3         Count: 7        BitCode: 00                 Bit Length: 2                   
Code: 4         Count: 2        BitCode: 111                Bit Length: 3                   
Code: 17        Count: 4        BitCode: 01                 Bit Length: 2                   
Code: 18        Count: 5        BitCode: 10                 Bit Length: 2                   

--------------------------------------------------------------------------
--------------------------------------------------------------------------

多余的位没有反转。

您的问题是长度为 2 的单个距离代码是不允许的。单个距离代码的长度必须为 1。来自 RFC 1951:

If only one distance code is used, it is encoded using one bit, not zero bits; in this case there is a single code length of one, with one unused code.