运行-长度解压使用C++

Question

我有一个文本文件，其中包含我编码的字符串。

假设是：aaahhhhiii kkkjjhh ikl wwwwwweeeett

这里是编码代码，它工作得很好：

void Encode(std::string &inputstring, std::string &outputstring)
{
    for (int i = 0; i < inputstring.length(); i++) {
        int count = 1;
        while (inputstring[i] == inputstring[i+1]) {
            count++;
            i++;
        }
        if(count <= 1) {
            outputstring += inputstring[i];
        } else {
            outputstring += std::to_string(count);
            outputstring += inputstring[i];
        }
    }
}

输出符合预期：3a4h3i 3k2j2h ikl 6w4e2t

现在，我想解压缩输出 - 回到原始状态。

几天以来我一直在为此苦苦挣扎。

目前我的想法：

void Decompress(std::string &compressed, std::string &original)
{
    char currentChar = 0;
    auto n = compressed.length();
    for(int i = 0; i < n; i++) {

        currentChar = compressed[i++];

        if(compressed[i] <= 1) {
            original += compressed[i];
        } else if (isalpha(currentChar)) {
            //
        } else {
            //
            int number = isnumber(currentChar).....
            original += number;
        }
    }
}

我知道我的解压缩功能看起来有点乱，但我对这个功能很迷惑。抱歉。

也许 Whosebug 上有人愿意帮助迷路的初学者。

感谢您的帮助，我很感激。

Answer 1

#include "string"
#include "iostream"


void Encode(std::string& inputstring, std::string& outputstring)
{
    for (unsigned int i = 0; i < inputstring.length(); i++) {
        int count = 1;
        while (inputstring[i] == inputstring[i + 1]) {
            count++;
            i++;
        }
        if (count <= 1) {
            outputstring += inputstring[i];
        }
        else {
            outputstring += std::to_string(count);
            outputstring += inputstring[i];
        }
    }
}

bool alpha_or_space(const char c)
{
    return isalpha(c) || c == ' ';
}

void Decompress(std::string& compressed, std::string& original)
{
    size_t i = 0;
    size_t repeat;
    while (i < compressed.length())
    {
        // normal alpha charachers
        while (alpha_or_space(compressed[i]))
            original.push_back(compressed[i++]);

        // repeat number
        repeat = 0;
        while (isdigit(compressed[i]))
            repeat = 10 * repeat + (compressed[i++] - '0');

        // unroll releat charachters
        auto char_to_unroll = compressed[i++];
        while (repeat--)
            original.push_back(char_to_unroll);
    }
}

int main()
{
    std::string deco, outp, inp = "aaahhhhiii kkkjjhh ikl wwwwwweeeett";

    Encode(inp, outp);
    Decompress(outp, deco);

    std::cout << inp << std::endl << outp << std::endl<< deco;

    return 0;
}

Answer 2

假设输入字符串不能包含数字（这不能被你的编码覆盖，例如字符串 "3a" 和 "aaa" 都会导致编码字符串 "3a" – 你怎么会曾经想再次分解？）那么你可以解压如下：

unsigned int num = 0;
for(auto c : compressed)
{
    if(std::isdigit(static_cast<unsigned char>(c)))
    {
        num = num * 10 + c - '0';
    }
    else
    {
        num += num == 0; // assume you haven't read a digit yet!
        while(num--)
        {
            original += c;
        }
    }
}

未经测试的代码，但是...

不过，字符串中的字符实际上只是数值。您也可以将 char（或 signed char、unsigned char）视为普通的 8 位整数。您也可以在这样的字节中存储一个数值。通常，您以这种方式进行运行长度编码：最多计算 255 个相等的字符，将计数存储在一个字节中，将字符存储在另一个字节中。单个 "a" 将被编码为 0x01 0x61（后者是 a 的 ASCII 值），"aa" 将编码为 0x02 0x61，依此类推。如果你必须存储超过 255 个相等的字符，你将存储两对：0xff 0x61, 0x07 0x61 表示包含字符 a 的 262 倍的字符串......解码然后变得微不足道：你成对读取字符，你解释的第一个字节作为数字，第二个作为字符 - 休息是微不足道的。你也用这种方式很好地覆盖了数字。

Answer 3

解压缩不可能以明确的方式进行，因为你没有定义一个标记字符；即给定压缩流，不可能确定一个数字是原始单个数字还是代表重复 RLE 命令。我建议使用“0”作为标记字符。编码时，如果您看到“0”，则只输出 010。任何其他 char X 将转换为 0NX，其中 N 是重复字节计数器。如果你超过 255，就输出一个新的 RLE 重复命令

运行-长度解压使用C++

Run-length decompression using C++

c++

compression

run-length-encoding