LZW 编码和 GIF 文件格式

Question

我正在尝试了解如何在 C++ 中创建 .gif 文件。到目前为止，我认为除了 LZW 编码的工作原理外，我了解所有内容。这是我用标签生成的文件：

47 49 46 38 39 61 -header
0A 00 01 00 91 00 -logical screen descriptor
00 00 FF 00 FF 00 -color table [green,red,yellow,black]
00 FF FF 00 00 00
00 21 F9 04 00 00 -graphics control extension
00 00 00 2C 00 00 -image descriptor
00 00 0A 00 01 00 -(10 pixels wide x 1 pixel tall)
00 02 04 8A 05 00 -encoded image
3B                -terminator

为了copy/paste目的，这里又没有标签：47 49 46 38 39 61 05 00 04 00 91 00 00 00 FF 00 FF 00 00 FF FF 00 00 00 00 21 F9 04 00 00 00 00 00 2C 00 00 00 00 0A 00 01 00 00 02 04 8A 05 00 3B

我很难理解 02 04 8A 05 如何转换为图像 yryryggyry。我知道 02 是最小代码大小， 04 是图像块的长度，我想我已经确定了 clear 和 EOI 代码，但我不知道理解中间的代码。

8A       05
10001010 00000101
100|01010 00000|101
 ^      ????     ^
 clear code      EOI code

到目前为止，我从 .gif 规范中获得了最多的信息： http://www.w3.org/Graphics/GIF/spec-gif89a.txt

这个网站也很有帮助： http://www.matthewflickinger.com/lab/whatsinagif/lzw_image_data.asp

谢谢

编辑*

我观看了评论中链接的 Youtube 视频并为彩色流手动编码图像 "yryryggyry":

Color table-012=gry

2   1   2   1   2   0   0   2   1   2
010 001 010 001 010 000 000 010 001 010

current next output dict
010     001  010    21 6
001     010  001    12 7
010     001  -      -
001     010  110    121 8
010     000  010    212 9
000     000  000    00  10
000     010  1010   002 11
010     001  -      -
001     010  110    -
010     -    010    -

outputs-100 010 001 110 010 000 1010 110 010 101

01010101 4th 55
10101000 3rd A8
00101100 2nd 2C
01010100 1st 54

Code-54 2C A8 55

我一定是弄错了，因为这段代码生成的图像是 "yr" 而不是 "yryryggyry"

我将尝试重做，看看是否会得到不同的答案

Answer 1

也许你在第 4 行犯了一个错误： 001 010 110 121 8

在第 3 行，“010”被忽略，因此您必须先将其添加到第 4 行。在第 4 行，它变成了：

current  next  output    dict
010 001  010   010 001   212   8

这是我的解决方案（也是手动创建的）：

LZW for yryryggyry

更新：

终于明白原因了：

When you are encoding the data, you increase your code size as soon as your write out the code equal to 2^(current code size)-1. If you are decoding from codes to indexes, you need to increase your code size as soon as you add the code value that is equal to 2^(current code size)-1 to your code table. That is, the next time you grab the next section of bits, you grab one more.

作者的意思是当你要输出2^(current code size) - 1时，你应该增加字数，但可能有不同的解释，似乎也有道理：

When you add #(2 ^ current code size) item to the code table, next output should increase its word size.

在作者的例子中也正确，这是我更喜欢的解释。

这是您的示例 ("yryryggyry")：

output sequence:
    #4 #2 #1 #6 #2 #0 #0 #8 #5

当您要输出#6 时，您将 "yry" 添加到您的代码 table 中，其索引为 #8。

因为 8 = 2 ^ 当前字长

(current word size = 2(original) + 1(reserved) = 3)

下一个输出应该增加字的大小，所以#2 变成一个 4 位的字。

最终输出序列为：

编码后变成

54 2C 00 58

所以数据块是

02            -minimum word size     
04            -data length
54 2c 00 58   -data
00            -data block terminator

LZW 编码和 GIF 文件格式

LZW encoding and the GIF file format

c++

gif

lzw

更新：