使用 BWT 进行文本压缩和解压缩

Text Compression and Decompression using BWT

请问能否将BWT MTF和Huffman算法结合起来,在java中获得更高的压缩率?过程是怎样的? 编写 MTF 文件时出错?

public class MTF{
    static File f=new File("MTF.txt");
public static File encode(String msg, String symTable)throws Exception{
            if(!f.exists())
                f.createNewFile();
    StringBuilder s = new StringBuilder(symTable);
    for(char c : msg.toCharArray()){
        int idx = s.indexOf("" + c);
                    FileWriter writer = new FileWriter(f); 
                    writer.write(idx+" "); 
                    System.out.print(idx+" ");
                    writer.flush();
                    writer.close();
        s = s.deleteCharAt(idx).insert(0, c);
    }
            System.out.println("MTF done");
    return f;
}

检验这个假设很容易,过程是:

  • 取一组有代表性的字符串(您的程序将在 "real world" 中处理的字符串);
  • 使用 BWT MTF 编码(互联网上有很多实现);
  • 用霍夫曼压缩;

一般来说:应用 MTF 应该可以提高可压缩性,例如这里提到的:http://michael.dipperstein.com/bwt/

BWT is useful because it converts the data into a format that is generally more compressible by run length encoders and statistical encoders with order greater than 0. By additionally applying move-to-front coding, the data will be in a format which is generally more compressible by even zero order statistical encoders such as traditional implementations of Huffman coding or arithmetic coding.