霍夫曼压缩读取文件不复制二进制文件c ++中的所有字节

Question

我的程序是霍夫曼压缩，除了一件烦人的事情外，一切都很好。当我从压缩文件中读取字节时，只有大约三分之一的字节被复制和解压缩（恢复为普通文本）。我真的不知道问题出在哪里。这是从文件中读取字节并将其 returns 到 STL 容器的函数：

template<class Container>
Container readcompressfile(string ifileloc) {
    ifstream ifile(ifileloc);

    if (!ifile) {
        throw runtime_error("Could not open " + ifileloc + " for reading");
    }

    noskipws(ifile);

    return Container(istream_iterator<uint8_t>(ifile), istream_iterator<uint8_t>());
}

下面是我在我的解压函数中调用它的方式（调用我包含在它下面的另一个函数，如果它很重要的话）（在 class 中）：

void decompressfile(string loc) {
        vector<uint8_t> vecbytes(readcompressfile<vector<uint8_t>>(ifilelocation)); // Here is where I'm using the above function

        vector<uint8_t>::iterator iter = vecbytes.begin();

        uint8_t ctr = 0xFF;
        bitset<8> b2 = 0;
        string code = "";

        for (; iter != vecbytes.end(); ++iter) {
            b2 = ctr & *iter;

            for (int i = 7; i >= 0; i--) {
                code += to_string(b2[i]);
            }
        }

        decodetext(code, loc);
    }

    //Reads bits and outputs string
    void decodetext(string codetext, string ofileloc) {
        string code = "";
        string text = "";
        char lett;

        for each (char ct in codetext) {
            code += ct;
            lett = returncharmap(code);
            if (lett != NULL) {
                text += lett;
                code = "";
            }
        }

        ofstream ofile(ofileloc);
        ofile << text;
        ofile.close();
    }

压缩功能将 1 和 0 的字符串转换为位（我将它们打包成字节），然后将其存储在文件中（工作正常），至于解压，正如您所注意到的，我阅读了readcompressfile(string ifileloc) 函数中的二进制文件，然后将其放入 vector<uint8_t> 容器中，然后将其转回 1 和 0 的字符串，然后再次转回文本，复制的字节可以很好地解压缩。

I displayed the size of the string before and after and here is the result

注意：readcompressfile(string ifileloc) 函数是我从 Whosebug 上的某个人那里复制的，因为它解决了我之前遇到的一个问题。

Answer 1

我猜你运行在 Windows 上，它将解释文本流中的 ^Z 字符（这是 [=13 的默认模式） =]) 作为文件结束指示符。

而不是：

 ifstream ifile(ifileloc);

使用：

 ifstream ifile(ifileloc, ifstream::in | ifstream::binary);

正如下面的评论所指出的，Windows 平台还将在文本模式下将 "\r\n" 字符序列转换为单个字符 "\n"。

霍夫曼压缩读取文件不复制二进制文件c ++中的所有字节

Huffman compression reading file does not copy all the bytes in the binary file c++

c++

huffman-code