位压缩结构
Bit compressed structure
我目前正在做一个项目,我需要在向量中存储大量(~数十亿个单位)的结构。我还需要以线性方式迭代该向量,因此我需要处理的数据越少越好。
于是自然而然的开始优化单体结构的尺寸。例如,如果我有多个 bool 值,我可以将 true/false 值存储在一个位中并将所有 bool 值压缩为一个 char/16-bit,只要大小足够。对于某些条目,我只需要 20 位无符号整数。因此我可以再次压缩这些值。
然后我得到这样的结果(请注意,这只是简化的示例):
class Foo {
private:
uint32_t m_time;
uint32_t m_comb;
public:
Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
: m_time(t), m_comb((big << 12) | (small << 2) | (is_blue << 1) | is_nice)
{ }
uint32_t get_time() const { return m_time; }
uint32_t get_big() const { return m_comb >> 12; }
uint16_t get_small() const { return m_comb & 0b11111111100; }
uint16_t is_blue() const { return m_comb & 0b10; }
uint16_t is_nice() const { return m_comb & 0b1; }
};
问题是,这可以使用模板以某种方式自动化吗?我的想法是,我将插入条目的顺序、所需的位大小,然后我将能够调用 get<i>()
,这将 return 我是结构的第 i 个条目。我的动机是摆脱手动编写代码,因为在我看来,挠痒痒的部分很容易出错。我试图自己实现这一点,但无可救药地失败了。
您可以使用 bit fields 轻松实现。
class Foo {
private:
uint32_t m_time;
uint32_t m_big : 20;
uint32_t m_small : 10;
uint32_t m_isblue : 1;
uint32_t m_isnice : 1;
public:
Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
: m_time(t), m_big(big), m_small(small), m_isblue(is_blue), m_isnice(is_nice)
{ }
uint32_t get_time() const { return m_time; }
uint32_t get_big() const { return m_big; }
uint16_t get_small() const { return m_small; }
uint16_t is_blue() const { return m_isblue; }
uint16_t is_nice() const { return m_isnice; }
};
Online demo 显示尺寸。
编辑:附加信息
总结评论,编译器将位字段打包在一起的方式取决于实现:
9.6/1 (...) Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is
implementation-defined. Bit-fields are packed into some addressable
allocation unit. [Note: Bit-fields straddle allocation units on some
machines and not on others. Bit-fields are assigned right-to-left on
some machines, left-to-right on others. —end note ]
所以你不能保证,但编译器通常会尽力将它们放在一起。根据经验,只要您的位字段是连续的并且它们的总位数小于您使用的基本类型,那么打包很可能是最佳的。如果需要,您可以微调基本类型,如 this online demo 所示。
由于位字段是 implementation-defined
我决定编写位 read/write 函数。当你想在磁盘上存储数据(缓存或其他东西)时它会派上用场。
这里是源码模拟link.
#include <iostream>
bool GetBit(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0x1;
}
uint8_t GetByte(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0xFF;
}
uint16_t GetShort(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0xFFFF;
}
uint32_t SetBit(bool val, uint32_t &dest, uint8_t pos)
{
dest ^= (-val ^ dest) & (1 << pos);
return dest;
}
uint32_t SetByte(uint8_t val, uint32_t &dest, uint8_t pos)
{
dest &= ~(0xFF<<pos); //clean
dest |= val<<pos; //set
return dest;
}
uint32_t SetShort(uint16_t val, uint32_t &dest, uint8_t pos)
{
dest &= ~(0xFFFF<<pos); //clean
dest |= val<<pos; //set
return dest;
}
void PrintBin(uint32_t s, const char* pszComment)
{
std::cout << pszComment << ": " << std::endl;
for (size_t n = 0; n < sizeof(s) * 8; n++)
std::cout << GetBit(s, n) << ' ';
std::cout << std::endl;
}
int main()
{
uint32_t s = 4294967295; //all bits
PrintBin(s, "Start");
SetBit(false, s, 2);
PrintBin(s, "Set bit 2 to FALSE");
SetByte(0, s, 22);
PrintBin(s, "Set byte 22 to val 0");
SetByte(30, s, 22);
PrintBin(s, "Set byte 22 to val 30");
SetShort(0, s, 4);
PrintBin(s, "Set short 4 to val 0");
SetShort(65000, s, 4);
PrintBin(s, "Set short 4 to val 65000");
std::cout << "byte at 22 = " << (int)GetByte(s, 22) << std::endl;
std::cout << "short at 4 = " << (int)GetShort(s, 4) << std::endl;
}
输出:
Start:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Set bit 2 to FALSE:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Set byte 22 to val 0:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1
Set byte 22 to val 30:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1
Set short 4 to val 0:
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1
Set short 4 to val 65000:
1 1 0 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1
byte at 22 = 30
short at 4 = 65000
您可以将这些函数转换为 #DEFINE
s 并在调试模式下添加范围检查。
我目前正在做一个项目,我需要在向量中存储大量(~数十亿个单位)的结构。我还需要以线性方式迭代该向量,因此我需要处理的数据越少越好。
于是自然而然的开始优化单体结构的尺寸。例如,如果我有多个 bool 值,我可以将 true/false 值存储在一个位中并将所有 bool 值压缩为一个 char/16-bit,只要大小足够。对于某些条目,我只需要 20 位无符号整数。因此我可以再次压缩这些值。
然后我得到这样的结果(请注意,这只是简化的示例):
class Foo {
private:
uint32_t m_time;
uint32_t m_comb;
public:
Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
: m_time(t), m_comb((big << 12) | (small << 2) | (is_blue << 1) | is_nice)
{ }
uint32_t get_time() const { return m_time; }
uint32_t get_big() const { return m_comb >> 12; }
uint16_t get_small() const { return m_comb & 0b11111111100; }
uint16_t is_blue() const { return m_comb & 0b10; }
uint16_t is_nice() const { return m_comb & 0b1; }
};
问题是,这可以使用模板以某种方式自动化吗?我的想法是,我将插入条目的顺序、所需的位大小,然后我将能够调用 get<i>()
,这将 return 我是结构的第 i 个条目。我的动机是摆脱手动编写代码,因为在我看来,挠痒痒的部分很容易出错。我试图自己实现这一点,但无可救药地失败了。
您可以使用 bit fields 轻松实现。
class Foo {
private:
uint32_t m_time;
uint32_t m_big : 20;
uint32_t m_small : 10;
uint32_t m_isblue : 1;
uint32_t m_isnice : 1;
public:
Foo(uint32_t t, uint32_t big, uint16_t small, bool is_blue, bool is_nice)
: m_time(t), m_big(big), m_small(small), m_isblue(is_blue), m_isnice(is_nice)
{ }
uint32_t get_time() const { return m_time; }
uint32_t get_big() const { return m_big; }
uint16_t get_small() const { return m_small; }
uint16_t is_blue() const { return m_isblue; }
uint16_t is_nice() const { return m_isnice; }
};
Online demo 显示尺寸。
编辑:附加信息
总结评论,编译器将位字段打包在一起的方式取决于实现:
9.6/1 (...) Allocation of bit-fields within a class object is implementation-defined. Alignment of bit-fields is implementation-defined. Bit-fields are packed into some addressable allocation unit. [Note: Bit-fields straddle allocation units on some machines and not on others. Bit-fields are assigned right-to-left on some machines, left-to-right on others. —end note ]
所以你不能保证,但编译器通常会尽力将它们放在一起。根据经验,只要您的位字段是连续的并且它们的总位数小于您使用的基本类型,那么打包很可能是最佳的。如果需要,您可以微调基本类型,如 this online demo 所示。
由于位字段是 implementation-defined
我决定编写位 read/write 函数。当你想在磁盘上存储数据(缓存或其他东西)时它会派上用场。
这里是源码模拟link.
#include <iostream>
bool GetBit(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0x1;
}
uint8_t GetByte(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0xFF;
}
uint16_t GetShort(uint32_t num, uint8_t pos)
{
//check pos < sizeof(num)*8
return (num >> pos) & 0xFFFF;
}
uint32_t SetBit(bool val, uint32_t &dest, uint8_t pos)
{
dest ^= (-val ^ dest) & (1 << pos);
return dest;
}
uint32_t SetByte(uint8_t val, uint32_t &dest, uint8_t pos)
{
dest &= ~(0xFF<<pos); //clean
dest |= val<<pos; //set
return dest;
}
uint32_t SetShort(uint16_t val, uint32_t &dest, uint8_t pos)
{
dest &= ~(0xFFFF<<pos); //clean
dest |= val<<pos; //set
return dest;
}
void PrintBin(uint32_t s, const char* pszComment)
{
std::cout << pszComment << ": " << std::endl;
for (size_t n = 0; n < sizeof(s) * 8; n++)
std::cout << GetBit(s, n) << ' ';
std::cout << std::endl;
}
int main()
{
uint32_t s = 4294967295; //all bits
PrintBin(s, "Start");
SetBit(false, s, 2);
PrintBin(s, "Set bit 2 to FALSE");
SetByte(0, s, 22);
PrintBin(s, "Set byte 22 to val 0");
SetByte(30, s, 22);
PrintBin(s, "Set byte 22 to val 30");
SetShort(0, s, 4);
PrintBin(s, "Set short 4 to val 0");
SetShort(65000, s, 4);
PrintBin(s, "Set short 4 to val 65000");
std::cout << "byte at 22 = " << (int)GetByte(s, 22) << std::endl;
std::cout << "short at 4 = " << (int)GetShort(s, 4) << std::endl;
}
输出:
Start:
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Set bit 2 to FALSE:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
Set byte 22 to val 0:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1
Set byte 22 to val 30:
1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1
Set short 4 to val 0:
1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 1 1 1 1 0 0 0 1 1
Set short 4 to val 65000:
1 1 0 1 0 0 0 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 1
byte at 22 = 30
short at 4 = 65000
您可以将这些函数转换为 #DEFINE
s 并在调试模式下添加范围检查。