Java - 我的 Huffman 解压拒绝解压非文本文件(returns 空文件)
Java - My Huffman decompression refuses to decompress non-text files (returns empty file)
我可以压缩所有类型的文件(.jpg、.mp4 等),但是当我尝试解压这些非文本文件时,程序只是 return 一个空的解压文件...奇怪的是我能够很好地解压缩纯文本文件。
当我压缩我的原始文件时,我将重建树所需的数据和编码位放在同一个文件中。格式看起来像这样:
<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]
其中 n 是唯一字节的总数(也就是我的树中的叶子数),值只是我的字节形式的叶子值,频率是每个字节的频率/"character"(频率是一个 int 值,因此它由每个频率 4 个字节组成。
我的代码中的 BitFileReader 和 BitFileWriter 只是 BufferedOutStream/InputStream 的包装器 类,一点一点地增加了 reading/writing 的功能。
我在下面添加了我的整个 Huffman 代码,但重点是底部的 compress() 和 decompress() 方法。至少我想知道我对这些方法的逻辑是否正确,如果是这样,是什么导致它在解压缩其他文件类型(不是纯文本文件)时 return 清空解压缩文件?
霍夫曼编码:
public class HuffmanCode {
public static Tree buildTree(int[] charFreqs) {
PriorityQueue<Tree> trees = new PriorityQueue<Tree>();
for (int i = 0; i < charFreqs.length; i++){
if (charFreqs[i] > 0){
trees.offer(new Leaf(charFreqs[i], i));
}
}
//assert trees.size() > 0;
while (trees.size() > 1) {
Tree a = trees.poll();
Tree b = trees.poll();
trees.offer(new Node(a, b));
}
return trees.poll();
}
public static void printStruct(Tree tree) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + " " + leaf.frequency);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
printStruct(node.left);
// traverse right
printStruct(node.right);
}
}
public static void printStruct(Tree tree, StringBuffer prefix) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
printStruct(node.left, prefix);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
printStruct(node.right, prefix);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(leaf.value, prefix.toString());
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillEncodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillEncodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(prefix.toString(), leaf.value);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillDecodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillDecodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void compress(File file){
try {
Path path = Paths.get(file.getAbsolutePath());
byte[] content = Files.readAllBytes(path);
TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
File nF = new File(file.getName()+"_comp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
int[] charFreqs = new int[256];
// read each byte and record the frequencies
for (byte b : content){
charFreqs[b&0xFF]++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillEncodeMap(tree, new StringBuffer(), encodeMap);
// Writes tree structure in binary form to nF (new file)
bfw.writeByte(encodeMap.size());
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] != 0){
ByteBuffer b = ByteBuffer.allocate(4);
b.putInt(charFreqs[i]);
byte[] result = b.array();
bfw.writeByte(i);
for(int j=0; j<4;j++){
bfw.writeByte(result[j]&0xFF);
}
}
}
// Write compressed data
for(byte b : content){
String code = encodeMap.get(b&0xFF);
for(char c : code.toCharArray()){
if(c == '1'){
bfw.write(1);
}
else{
bfw.write(0);
}
}
}
bfw.close();
System.out.println("Compression successful!");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName()+"_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
编辑:我找到了原因,但我不知道如何解决或为什么会发生。问题是我的解压缩方法中的 charFreqs[] 数组永远不会被填充(它的所有值仍然为零,也就是根据数组,所有字节的频率都为零)。
我解决了!问题出在我的 compress()
方法中的 bfw.writeByte(encodeMap.size())
行。它只会将字节写入文件,但 encodeMap.size()
函数可以 return 如果它已满,则值为 256。 256 是一个比一个字节可以容纳的值更高的值(bfw.writeByte()
实际上将一个 int 作为参数,但它只写入 int 的最低 8 位,基本上只写入一个字节可以容纳的位,所以在某种程度上它实际上具有无符号字节的范围 0-255)。
我通过更改两行代码解决了这个问题。我的 compress()
方法中的行 bfw.writeByte(encodeMap.size())
更改为 bfw.writeByte(encodeMap.size()-1)
,我的 decompress()
方法中的行 uniqueBytes = data.readUnsignedByte()
更改为 uniqueBytes = data.readUnsignedByte() + 1
我可以压缩所有类型的文件(.jpg、.mp4 等),但是当我尝试解压这些非文本文件时,程序只是 return 一个空的解压文件...奇怪的是我能够很好地解压缩纯文本文件。
当我压缩我的原始文件时,我将重建树所需的数据和编码位放在同一个文件中。格式看起来像这样:
<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]
其中 n 是唯一字节的总数(也就是我的树中的叶子数),值只是我的字节形式的叶子值,频率是每个字节的频率/"character"(频率是一个 int 值,因此它由每个频率 4 个字节组成。
我的代码中的 BitFileReader 和 BitFileWriter 只是 BufferedOutStream/InputStream 的包装器 类,一点一点地增加了 reading/writing 的功能。
我在下面添加了我的整个 Huffman 代码,但重点是底部的 compress() 和 decompress() 方法。至少我想知道我对这些方法的逻辑是否正确,如果是这样,是什么导致它在解压缩其他文件类型(不是纯文本文件)时 return 清空解压缩文件?
霍夫曼编码:
public class HuffmanCode {
public static Tree buildTree(int[] charFreqs) {
PriorityQueue<Tree> trees = new PriorityQueue<Tree>();
for (int i = 0; i < charFreqs.length; i++){
if (charFreqs[i] > 0){
trees.offer(new Leaf(charFreqs[i], i));
}
}
//assert trees.size() > 0;
while (trees.size() > 1) {
Tree a = trees.poll();
Tree b = trees.poll();
trees.offer(new Node(a, b));
}
return trees.poll();
}
public static void printStruct(Tree tree) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + " " + leaf.frequency);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
printStruct(node.left);
// traverse right
printStruct(node.right);
}
}
public static void printStruct(Tree tree, StringBuffer prefix) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
printStruct(node.left, prefix);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
printStruct(node.right, prefix);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(leaf.value, prefix.toString());
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillEncodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillEncodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(prefix.toString(), leaf.value);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillDecodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillDecodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void compress(File file){
try {
Path path = Paths.get(file.getAbsolutePath());
byte[] content = Files.readAllBytes(path);
TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
File nF = new File(file.getName()+"_comp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
int[] charFreqs = new int[256];
// read each byte and record the frequencies
for (byte b : content){
charFreqs[b&0xFF]++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillEncodeMap(tree, new StringBuffer(), encodeMap);
// Writes tree structure in binary form to nF (new file)
bfw.writeByte(encodeMap.size());
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] != 0){
ByteBuffer b = ByteBuffer.allocate(4);
b.putInt(charFreqs[i]);
byte[] result = b.array();
bfw.writeByte(i);
for(int j=0; j<4;j++){
bfw.writeByte(result[j]&0xFF);
}
}
}
// Write compressed data
for(byte b : content){
String code = encodeMap.get(b&0xFF);
for(char c : code.toCharArray()){
if(c == '1'){
bfw.write(1);
}
else{
bfw.write(0);
}
}
}
bfw.close();
System.out.println("Compression successful!");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName()+"_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
编辑:我找到了原因,但我不知道如何解决或为什么会发生。问题是我的解压缩方法中的 charFreqs[] 数组永远不会被填充(它的所有值仍然为零,也就是根据数组,所有字节的频率都为零)。
我解决了!问题出在我的 compress()
方法中的 bfw.writeByte(encodeMap.size())
行。它只会将字节写入文件,但 encodeMap.size()
函数可以 return 如果它已满,则值为 256。 256 是一个比一个字节可以容纳的值更高的值(bfw.writeByte()
实际上将一个 int 作为参数,但它只写入 int 的最低 8 位,基本上只写入一个字节可以容纳的位,所以在某种程度上它实际上具有无符号字节的范围 0-255)。
我通过更改两行代码解决了这个问题。我的 compress()
方法中的行 bfw.writeByte(encodeMap.size())
更改为 bfw.writeByte(encodeMap.size()-1)
,我的 decompress()
方法中的行 uniqueBytes = data.readUnsignedByte()
更改为 uniqueBytes = data.readUnsignedByte() + 1