使用 BWT 进行文本压缩和解压缩
Text Compression and Decompression using BWT
请问能否将BWT MTF和Huffman算法结合起来,在java中获得更高的压缩率?过程是怎样的?
编写 MTF 文件时出错?
public class MTF{
static File f=new File("MTF.txt");
public static File encode(String msg, String symTable)throws Exception{
if(!f.exists())
f.createNewFile();
StringBuilder s = new StringBuilder(symTable);
for(char c : msg.toCharArray()){
int idx = s.indexOf("" + c);
FileWriter writer = new FileWriter(f);
writer.write(idx+" ");
System.out.print(idx+" ");
writer.flush();
writer.close();
s = s.deleteCharAt(idx).insert(0, c);
}
System.out.println("MTF done");
return f;
}
检验这个假设很容易,过程是:
- 取一组有代表性的字符串(您的程序将在 "real world" 中处理的字符串);
- 使用 BWT MTF 编码(互联网上有很多实现);
- 用霍夫曼压缩;
一般来说:应用 MTF 应该可以提高可压缩性,例如这里提到的:http://michael.dipperstein.com/bwt/
BWT is useful because it converts the data into a format that is
generally more compressible by run length encoders and statistical
encoders with order greater than 0. By additionally applying
move-to-front coding, the data will be in a format which is generally
more compressible by even zero order statistical encoders such as
traditional implementations of Huffman coding or arithmetic coding.
请问能否将BWT MTF和Huffman算法结合起来,在java中获得更高的压缩率?过程是怎样的? 编写 MTF 文件时出错?
public class MTF{
static File f=new File("MTF.txt");
public static File encode(String msg, String symTable)throws Exception{
if(!f.exists())
f.createNewFile();
StringBuilder s = new StringBuilder(symTable);
for(char c : msg.toCharArray()){
int idx = s.indexOf("" + c);
FileWriter writer = new FileWriter(f);
writer.write(idx+" ");
System.out.print(idx+" ");
writer.flush();
writer.close();
s = s.deleteCharAt(idx).insert(0, c);
}
System.out.println("MTF done");
return f;
}
检验这个假设很容易,过程是:
- 取一组有代表性的字符串(您的程序将在 "real world" 中处理的字符串);
- 使用 BWT MTF 编码(互联网上有很多实现);
- 用霍夫曼压缩;
一般来说:应用 MTF 应该可以提高可压缩性,例如这里提到的:http://michael.dipperstein.com/bwt/
BWT is useful because it converts the data into a format that is generally more compressible by run length encoders and statistical encoders with order greater than 0. By additionally applying move-to-front coding, the data will be in a format which is generally more compressible by even zero order statistical encoders such as traditional implementations of Huffman coding or arithmetic coding.