LZW算法解压难理解

Question

我使用 LZW 压缩算法压缩了以下消息：“ababcbababaaaaaaa”。

当 a=1;b=2;c=3 时，我得到以下消息：“1 2 4 3 5 8 1 10 11 1”，这与我的教授在我们的练习笔记中得到的结果相匹配。

但是，当我尝试解压缩邮件时，我得到以下事件序列：

Current: a  ; Next: b  ; Output: a  ; AddToDict: ab=4;

Current: b  ; Next: a  ; Output: b  ; AddToDict: ba=5;

Current: ab ; Next: c  ; Output: ab ; AddToDict: abc=6;

Current: c  ; Next: ba ; Output: c  ; AddToDict: cb=7;

Current: ba ; Next: 8? ; Output: ?  ; AddToDict: ?;

如您所见，我的问题是我的字典中还没有 8（应该是“bab”）。

我做错了什么？

压缩得到的完整字典为

(1=a;2=b;3=c;4=ab;5=ba;6=abc;7=cb;8=bab;9=baba;10=aa;11=aaa; 12=aaaa)

Answer 1

你要小心了！第一列是“当前序列”，但第二列是“下一个字符”（不是“下一个序列”）。因此，你在写“next: ba”时犯了错误。顺便说一下，直到最后一行是正确的：

Current: ba  ; Next: b   ; Output: ba   ; AddToDict: bab = 8;   // here is 5
Current: bab ; Next: a   ; Output: bab  ; AddToDict: baba = 9;  // here you can find 8
Current: a   ; Next: a   ; Output: a    ; AddToDict: aa = 10;   // here is 1 
Current: aa  ; Next: a   ; Output: aa   ; AddToDict: aaa = 11;  // here is 10
Current: aaa ; Next: a   ; Output: aaa  ; AddToDict: aaaa = 12; // here is 11
Current: a   ; Next: #   ; Output: a    ; #                     // here is 1

Answer 2

您正在考虑 Current Next。我根据 Previous Current 来查看解码。因此，在阅读此答案时请注意 Current 是消息中的当前编号（请参阅下面 table 中标记为 Current 的列，并观察它包含要解码的消息） .

请注意，根据我的定义，消息的第一个代码必须被视为特殊情况，因为没有 Previous 代码。消息中的第一个代码始终是单个字母代码，处理该代码时不会创建字典条目。

对于消息中的所有其他代码，有两种可能性：

典型：Current代码有字典条目。在这种情况下，输出是 Current 代码的字典字符串。新字典字符串是通过获取 Previous 代码的字符串，并附加 Current 代码的字符串的第一个字母形成的。
古怪：Current 代码还没有字典条目。在这种情况下，输出是 Previous 代码的字符串，加上 Previous 代码的字符串的第一个字母。新字典字符串是通过获取 Previous 代码的字符串，并附加 Previous 代码的字符串的第一个字母形成的。

考虑到这一点，消息的解码方式如下：

Previous  Current  Output    Dictionary  Type
    -        1       a        -          special
    1        2       b        4: ab      typical
    2        4       ab       5: ba      typical
    4        3       c        6: abc     typical
    3        5       ba       7: cb      typical
    5        8       bab      8: bab     quirky: dict[5] is "ba" plus first letter of dict[5] is "b" means dict[8] is "bab"
    8        1       a        9: baba    typical
    1       10       aa      10: aa      quirky
   10       11       aaa     11: aaa     quirky
   11        1       a       12: aaaa    typical

注意在古怪的情况下，输出和新的字典条目是相同的。它们是使用专门来自 Previous 代码的信息生成的。 Current 代码在解码古怪的情况时被忽略，新的字典条目是 Current 代码的条目。

LZW算法解压难理解

Difficulty understanding decompression with LZW algorithm

compression

algorithm

information-theory

lzw