Java - 我的 Huffman 解压拒绝解压非文本文件(returns 空文件)

Java - My Huffman decompression refuses to decompress non-text files (returns empty file)

我可以压缩所有类型的文件(.jpg、.mp4 等),但是当我尝试解压这些非文本文件时,程序只是 return 一个空的解压文件...奇怪的是我能够很好地解压缩纯文本文件。

当我压缩我的原始文件时,我将重建树所需的数据和编码位放在同一个文件中。格式看起来像这样:

<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]

其中 n 是唯一字节的总数(也就是我的树中的叶子数),值只是我的字节形式的叶子值,频率是每个字节的频率/"character"(频率是一个 int 值,因此它由每个频率 4 个字节组成。

我的代码中的 BitFileReader 和 BitFileWriter 只是 BufferedOutStream/InputStream 的包装器 类,一点一点地增加了 reading/writing 的功能。

我在下面添加了我的整个 Huffman 代码,但重点是底部的 compress() 和 decompress() 方法。至少我想知道我对这些方法的逻辑是否正确,如果是这样,是什么导致它在解压缩其他文件类型(不是纯文本文件)时 return 清空解压缩文件?

霍夫曼编码:

public class HuffmanCode {


    public static Tree buildTree(int[] charFreqs) {
        PriorityQueue<Tree> trees = new PriorityQueue<Tree>();

        for (int i = 0; i < charFreqs.length; i++){
            if (charFreqs[i] > 0){
                trees.offer(new Leaf(charFreqs[i], i));
            }
        }

        //assert trees.size() > 0;

        while (trees.size() > 1) {
            Tree a = trees.poll();
            Tree b = trees.poll();

            trees.offer(new Node(a, b));
        }
        return trees.poll();
    }

    public static void printStruct(Tree tree) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + " " + leaf.frequency);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            printStruct(node.left);

            // traverse right
            printStruct(node.right);
        }
    }


    public static void printStruct(Tree tree, StringBuffer prefix) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            printStruct(node.left, prefix);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            printStruct(node.right, prefix);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(leaf.value, prefix.toString());

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillEncodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillEncodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }

    public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
        //assert tree != null;
        if (tree instanceof Leaf) {
            Leaf leaf = (Leaf)tree;

            treeMap.put(prefix.toString(), leaf.value);

        } else if (tree instanceof Node) {
            Node node = (Node)tree;

            // traverse left
            prefix.append('0');
            fillDecodeMap(node.left, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);

            // traverse right
            prefix.append('1');
            fillDecodeMap(node.right, prefix, treeMap);
            prefix.deleteCharAt(prefix.length()-1);
        }
    }



    public static void compress(File file){
        try {
            Path path = Paths.get(file.getAbsolutePath());
            byte[] content = Files.readAllBytes(path);
            TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
            File nF = new File(file.getName()+"_comp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);

            int[] charFreqs = new int[256];

            // read each byte and record the frequencies
            for (byte b : content){
                charFreqs[b&0xFF]++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillEncodeMap(tree, new StringBuffer(), encodeMap);

            // Writes tree structure in binary form to nF (new file)
            bfw.writeByte(encodeMap.size());
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] != 0){
                    ByteBuffer b = ByteBuffer.allocate(4);
                    b.putInt(charFreqs[i]);
                    byte[] result = b.array();

                    bfw.writeByte(i);
                    for(int j=0; j<4;j++){
                        bfw.writeByte(result[j]&0xFF);
                    }
                }
            }

            // Write compressed data
            for(byte b : content){
                String code = encodeMap.get(b&0xFF);
                for(char c : code.toCharArray()){
                    if(c == '1'){
                        bfw.write(1);
                    }
                    else{
                        bfw.write(0);
                    }
                }
            }
            bfw.close();
            System.out.println("Compression successful!");

        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void decompress(File file){
        try {
            BitFileReader bfr = new BitFileReader(file);
            int[] charFreqs = new int[256];
            TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
            File nF = new File(file.getName()+"_decomp");
            nF.createNewFile();
            BitFileWriter bfw = new BitFileWriter(nF);
            DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));

            int uniqueBytes;
            int counter = 0;
            int byteCount = 0;
            uniqueBytes = data.readUnsignedByte();

            // Read frequency table
            while (counter < uniqueBytes){
              int index = data.readUnsignedByte();
              int freq = data.readInt();
              charFreqs[index] = freq;
              counter++;
            }

            // build tree
            Tree tree = buildTree(charFreqs);

            // build TreeMap
            fillDecodeMap(tree, new StringBuffer(), decodeMap);

            // Skip BitFileReader position to actual compressed code
            bfr.skip(uniqueBytes*5);

            // Get total number of compressed bytes
            for(int i=0; i<charFreqs.length; i++){
                if(charFreqs[i] > 0){
                    byteCount += charFreqs[i];
                }
            }

            // Decompress data and write
            counter = 0;
            StringBuffer code = new StringBuffer();

            while(bfr.hasNextBit() && counter < byteCount){
                code.append(""+bfr.nextBit());

                if(decodeMap.containsKey(code.toString())){
                    bfw.writeByte(decodeMap.get(code.toString()));
                    code.setLength(0);
                    counter++;
                }
            }
            bfw.close();
            bfr.close();
            data.close();

            System.out.println("Decompression successful!");

        } 

        catch (IOException e) {
            e.printStackTrace();
        }
    }

    public static void main(String[] args) {
        File f = new File("test");
        compress(f);
        f = new File("test_comp");
        decompress(f);
    }
}

编辑:我找到了原因,但我不知道如何解决或为什么会发生。问题是我的解压缩方法中的 charFreqs[] 数组永远不会被填充(它的所有值仍然为零,也就是根据数组,所有字节的频率都为零)。

我解决了!问题出在我的 compress() 方法中的 bfw.writeByte(encodeMap.size()) 行。它只会将字节写入文件,但 encodeMap.size() 函数可以 return 如果它已满,则值为 256。 256 是一个比一个字节可以容纳的值更高的值(bfw.writeByte() 实际上将一个 int 作为参数,但它只写入 int 的最低 8 位,基本上只写入一个字节可以容纳的位,所以在某种程度上它实际上具有无符号字节的范围 0-255)。

我通过更改两行代码解决了这个问题。我的 compress() 方法中的行 bfw.writeByte(encodeMap.size()) 更改为 bfw.writeByte(encodeMap.size()-1),我的 decompress() 方法中的行 uniqueBytes = data.readUnsignedByte() 更改为 uniqueBytes = data.readUnsignedByte() + 1