提升二进制档案 - 减少大小

Question

我正在尝试减少 C++ 中 boost 存档的内存大小。

我发现的一个问题是 Boost 的二进制存档默认为任何 int 使用 4 个字节，无论其大小如何。出于这个原因，我得到一个空的 boost 二进制存档需要 62 个字节，而一个空的文本存档需要 40 个字节（空文本存档的文本表示：22 serialization::archive 14 0 0 1 0 0 0 0 0）。

有什么方法可以更改整数的这种默认行为吗？

否则，除了对向量使用 make_array 之外，还有其他方法可以优化二进制存档的大小吗？

Answer 1

正如 Alexey 所说，在 Boost 中，您必须使用较小的成员变量。唯一做得更好的序列化是，据我所知，Google 协议缓冲区和 ASN.1 PER。

GPB 使用可变长度整数来使用适合传输值的字节数。

ASN.1 PER 以不同的方式处理它；在 ASN.1 方案中，您可以定义值的有效范围。因此，如果您将一个 int 字段声明为具有 0 到 15 之间的有效范围，它将仅使用 4 位。 uPER 走得更远；它不会将字段的位与字节边界对齐，从而节省更多位。 uPER 是 3G、4G 通过无线电使用的 link，节省了大量带宽。

据我所知，大多数其他尝试都涉及 post 使用 ZIP 或类似工具进行序列化压缩。对于大量数据很好，否则就是垃圾。

Answer 2

Q. I am trying to reduce the memory size of boost archives in C++.

参见 Boost C++ Serialization overhead
Q. One problem I have found is that Boost's binary archives default to using 4 bytes for any int, regardless of its magnitude.

那是因为它是序列化库，不是压缩库
Q. For this reason, I am getting that an empty boost binary archive takes 62 bytes while an empty text archive takes 40 (text representation of an empty text archive: 22 serialization::archive 14 0 0 1 0 0 0 0 0).

使用存档标志：例如来自 Boost Serialization : How To Predict The Size Of The Serialized Result?:
- Tune things (boost::archive::no_codecvt, boost::archive::no_header, disable tracking etc.)
Q. Is there any way to change this default behavior for ints?

没有。虽然有 BOOST_IS_BITWISE_SERIALIZABLE(T)（参见示例和解释）。

Q. Else, are there any other ways to optimize the size of a binary archive apart from using make_array for vectors?

使用 make_array 对 vector<int> 没有帮助：

Live On Coliru

#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>

static auto const flags = boost::archive::no_header | boost::archive::no_tracking;

template <typename T>
std::string direct(T const& v) {
    std::ostringstream oss;
    {
        boost::archive::binary_oarchive oa(oss, flags);
        oa << v;
    }
    return oss.str();
}

template <typename T>
std::string as_pod_array(T const& v) {
    std::ostringstream oss;
    {
        boost::archive::binary_oarchive oa(oss, flags);
        oa << v.size() << boost::serialization::make_array(v.data(), v.size());
    }
    return oss.str();
}

int main() {
    std::vector<int> i(100);
    std::cout << "direct: "       << direct(i).size() << "\n";
    std::cout << "as_pod_array: " << as_pod_array(i).size() << "\n";
}

版画

direct: 408
as_pod_array: 408

压缩

最直接的优化方法是压缩生成的流（另请参阅添加的基准 here）。

除此之外，您将必须覆盖默认序列化并应用您自己的压缩（可以是简单的运行长度编码、霍夫曼编码或更特定领域的编码）。

演示

Live On Coliru

#include <boost/archive/binary_oarchive.hpp>
#include <boost/serialization/vector.hpp>
#include <sstream>
#include <iostream>
#include <boost/iostreams/filter/bzip2.hpp>
#include <boost/iostreams/filtering_stream.hpp>
#include <boost/iostreams/device/back_inserter.hpp>
#include <boost/iostreams/copy.hpp>

static auto const flags = boost::archive::no_header | boost::archive::no_tracking;

template <typename T>
size_t archive_size(T const& v)
{
    std::stringstream ss;
    {
        boost::archive::binary_oarchive oa(ss, flags);
        oa << v;
    }

    std::vector<char> compressed;
    {
        boost::iostreams::filtering_ostream fos;
        fos.push(boost::iostreams::bzip2_compressor());
        fos.push(boost::iostreams::back_inserter(compressed));

        boost::iostreams::copy(ss, fos);
    }

    return compressed.size();
}

int main() {
    std::vector<int> i(100);
    std::cout << "bzip2: " << archive_size(i) << "\n";
}

版画

bzip2: 47

压缩率约为 11%（如果您删除存档标志，则约为 19%）。

提升二进制档案 - 减少大小

Boost binary archives - reducing size

c++

serialization

boost

archive

压缩

演示