在 C# 中读取霍夫曼编码树

Question

我正在尝试构建一个 Huffman 编码器，它可以从文本文件中读取字节并将它们转换为 Huffman 代码。我需要能够解码。

到目前为止，我已经能够从文件中读取字节并将它们放入基于表示霍夫曼编码的双链表的树中。我有一个包含以下文本的文本文件："abcabcabd"。这是我生成的树：

我的目标是读出树并将整个输出放入一个数组中。我预期的数组如下所示："1, 01, 000, 1, 01, 000, 1, 01, 001"

唯一的问题是我不知道如何确定字母的位置。我想用 1/0 方法来做。

所以我的问题是：如何确定 'byte' 代码的位置。我会在 while 循环中做吗？我希望它是可变的，因此无法创建库。我试过这个：

首先我创建我的树：

public static void mBitTree()
    {
        W = H;
        if (W.N != null)
        {
            int Fn = W.A;
            int Sn = W.N.A;
            int Cn = Fn + Sn;

            cZip n = new cZip(0, Cn);

            //First Set new node to link to subnodes.
            n.L = W;
            n.R = W.N;

            if (W.N.N != null)
            {
                n.N = H.N.N;
                H.N.N.P = n;
                H.N.N = n;
                //Safe and set new head and fix links.
                W = H.N.N;
                H.N.N = null;
                H.N.P = null;
                H.N = null;
                H = W;
            }
            //this means there were 2 nodes left. so the newly created one will become Head ant Tail and the tree is complete.
            else
            {
                H.N.P = null;
                H.N = null;
                H = n;
                T = n;
            }
        } 
        else
        {
            return;
        }
    }

我开始的顶部节点是 H 和 T。它还有一个 L 和 R。首先需要检查的是 H.L 不等于 current。然后去H.R.L，然后H.R.R.L等等。如果我有一个更大的文本文件，这也需要工作。所以我创建了这样的东西：

public static string[] mCodedchain(uint[] count, byte[] bytesInFile)
    {
        int counter = 0;                                                                
        for (int i = 0; i < count.Length; i++) { if (count[i] != 0) counter = counter + (int)count[i]; }
        string[] codedArray = new string[counter];


        for (int i = 0; i < bytesInFile.Length; i++)
        {
            int number = 0;
            while ((int)bytesInFile[i] != number )
            {
                number = H.L.V //and if not H.L Add R in between to go to H.R.L.V
            }               
        }            

        return codedArray;
    }

但不知道如何构建 while 循环。我已经尝试了很多东西，但这个似乎是最有效的，但我无法让它工作。

我不太确定这个问题是否清楚。但我希望是。

提前致谢，祝编码愉快。

Answer 1

解码静态霍夫曼数据其实很简单。

你需要什么：

带有符号的节点树（letters/bytes 在你的例子中）附加在与编码期间使用的相同配置中
一种从流中读取位的方法，一次一位

有了这些，下面是进行解码的伪代码：

<node> ← <root>
WHILE MORE
    <bit> ← <ReadBit()>
    IF <bit> IS 1 THEN
        <node> ← <node.LEFT>
    ELSE
        <node> ← <node.RIGHT>
    END IF
    IF <node.SYMBOL> THEN
        OUTPUT <node.SYMBOL> AS DECODED SYMBOL
        <node> ← <root>
    END IF
END WHILE

或者用简单的英语：

Start at the root node.
Read 1 bit from the stream. If the bit is 1, go to the left child of the current node, otherwise go to the right child.

Whenever you hit a node with a symbol in it, output the symbol and go back to the root node Go back to "read 1 bit" and keep going until you've decoded the entire stream

注意！您需要知道要分别输出多少个符号。这样做的原因是编码流中的最后一个字节可能有额外的位，如果你也解码这些位，与编码前的原始数据相比，你最终可能会得到额外的符号。

在 C# 中读取霍夫曼编码树

Reading the Huffman encoding tree in C#

c#

tree

loops

huffman-code

doubly-linked-list