哈夫曼编码，如何处理字节的压缩？

Question

我的霍夫曼压缩程序应该能够压缩任何类型的文件。这就是为什么我使用 FileInputStream 从文件中读取字节而不是字符。

第一步，创建的频率table是一个整数数组。

    /**
 * Initialises field frequencyTable.
 * Opens file, reads one byte at a time and counts each bytes frequency.
 * @param file the file to be read.
 *
 */
private void buildFrequencyTable(String file){

    //since we are reading one byte at a time there are 256 possible values.
    this.frequencyTable = new int[256];

    try{

        FileInputStream in = new FileInputStream(file);
        int currentByte;

        while((currentByte = in.read())!=-1){

            //add that byte to frequencyTable and increment it.
            this.frequencyTable[currentByte] ++;
        }

    }catch (IOException e){

        e.printStackTrace();
    }
}

第 2 步：我创建霍夫曼树。

this.pq 是一个优先级队列。 pq 中的所有对象都是为每个唯一字节创建的节点。每个节点都有一个表示字节的整数值。另一个整数值表示该字节的频率。节点与其频率相当，因此在构建哈夫曼树时，频率最高的字节使用更少的位等进行编码。

节点class:

class Node implements Comparable<Node>{

private int value;
private int frequency;
private Node left_child;
private Node right_child;

Node(int value, int frequency, Node left_child, Node right_child){

    this.value = value;
    this.frequency = frequency;
    this.left_child = left_child;
    this.right_child = right_child;
}
//Checks if two nodes have equal frequency.
private boolean equals(Node n){

    return this.frequency == n.frequency;
}



@Override
public int compareTo(Node other) {

    if(this.equals(other)){
        return 0;
    }else if(this.frequency > other.frequency){
        return 1;
    }else return -1;

}

哈夫曼树生成器方法：

private void createHuffmanTree(){

    while(this.pq.size() > 1){

        Node left = this.pq.poll();
        Node right = this.pq.poll();
        //null character means its not a leaf node.
        Node parent = new Node('[=12=]',left.getFrequency()+right.getFrequency(), left, right);
        this.pq.add(parent);
    }

    //The last item in priority queue will be the final huffman tree node.
    this.huffmanTree = this.pq.poll();


}

我的问题：

步骤 3. 进行压缩。

所以现在我有一个霍夫曼树可以用来创建原始文件的编码版本。但是我不明白这一步。

如果我创建一个将每个字节映射到其霍夫曼位编码的 hashMap。我是否将编码作为字符串放在地图中？

如果是这样，如果我创建一个名为 Compress() 的方法。然后我一次一个字节地再次读取原始文件。我将如何使用这个 hashMap 来： 1. 从文件中找到作为当前字节的键值， 2.找到它对应的位编码（即字符串）并将这些单独的位写入另一个压缩版本的文件？

例如。我有字节 65 (A)，它的编码为“11”。然后压缩方法将输出：11 11 11（带有额外填充）

注意！我得到了一些实现 class 的代码：BitFileReader。这个class读取一个文件，一次可以读取一位。虽然我不知道如何在我的程序中使用这个 class 。

Answer 1

Pseudo-code（非常）

void convert(InputStream in, BitFileWriter out) {
    HuffmanTree huffman = new HuffmanTree()
    int b;
    while ((b = in.read()) != -1) {
        huffman.add(b);
        Bits bits = huffman.get(b);
        for (int bit : bits.values()) {
             out.writeBit(bit);
        }
    }
    out.close();
}

位可能只是 int[]。或者使用 BitSet:

class Bits {
    final BitSet bitset;
    final int nbits;

    Bits(BitSet bitset, int nbits) {
        this.bitset = bitset;
        this.nbits = nbits;
    }

    int[] values() {
        int[] v = new int[nbits];
        for (int i = bitset.nextSetBit(0); i >= 0; i = bs.nextSetBit(i + 1)) {
            v[i] = 1;
        }
        return v;
    }
}

哈夫曼编码，如何处理字节的压缩？

Huffman encoding, how to do the compression dealing with bytes?

java

bit-manipulation

huffman-code