从 C 中的字节数组中提取 14 位值
Extract 14-bit values from an array of bytes in C
在 C 中任意大小的字节数组中,我想存储 14 位数字 (0-16,383) 紧密打包。换句话说,在序列中:
0000000000000100000000000001
有两个数字我希望能够任意存储和检索为 16 位整数。 (在这种情况下,它们都是 1,但可以是给定范围内的任何值)如果我有函数 uint16_t 14bitarr_get(unsigned char* arr, unsigned int index)
和 void 14bitarr_set(unsigned char* arr, unsigned int index, uint16_t value)
,我将如何实现这些函数?
这不是作业项目,只是我自己的好奇心。我有一个特定的项目将用于此,它是整个项目的 key/center。
我不想要一个包含 14 位值的结构数组,因为这会为存储的每个结构生成浪费位。我希望能够将尽可能多的 14 位值紧密打包到一个字节数组中。 (例如:在我发表的评论中,将尽可能多的 14 位值放入 64 字节的块中是可取的,没有浪费位。对于特定用例,这些 64 字节的工作方式完全紧凑,这样即使一点点的浪费就会失去存储另一个 14 位值的能力)
存储问题的基础
你面临的最大问题是“我的存储基础是什么?”这个基本问题你知道基础知识,你可以使用的是char
、short
、int
等...最小的是 8-bits
。无论你如何切片你的存储方案,它最终都必须基于这个每字节 8 位的布局以一个内存单元驻留在内存中。
唯一最佳的、不浪费位的内存分配是声明一个 char 数组,它是 14 位的最小公倍数。在这种情况下是完整的 112-bits
(7-shorts
或 14-chars
)。这可能是最好的选择。在这里,声明一个 7-shorts 或 14-chars 的数组,将允许精确存储 8 14-bit
个值。当然,如果你不需要其中的 8 个,那么它就没有多大用处,因为它浪费的不仅仅是单个无符号值上丢失的 4 位。
如果您想进一步探索,请告诉我。如果是,我很乐意帮助实施。
位域结构
关于bitfield packing或bit packing的评论正是你需要做的。这可能涉及单独的结构或与联合的组合,或者根据需要手动 right/left 直接移动值。
一个适用于您的情况的简短示例(如果我理解正确的话,您需要内存中的 2 个 14 位区域)是:
#include <stdio.h>
typedef struct bitarr14 {
unsigned n1 : 14,
n2 : 14;
} bitarr14;
char *binstr (unsigned long n, size_t sz);
int main (void) {
bitarr14 mybitfield;
mybitfield.n1 = 1;
mybitfield.n2 = 1;
printf ("\n mybitfield in memory : %s\n\n",
binstr (*(unsigned *)&mybitfield, 28));
return 0;
}
char *binstr (unsigned long n, size_t sz)
{
static char s[64 + 1] = {0};
char *p = s + 64;
register size_t i = 0;
for (i = 0; i < sz; i++) {
p--;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
输出
$ ./bin/bitfield14
mybitfield in memory : 0000000000000100000000000001
注意: 取消引用 mybitfield
以打印内存中的值 打破了严格的别名 并且它是有意的仅用于输出示例。
以提供的方式使用结构的好处和目的在于它允许直接访问结构的每个 14 位部分,而无需手动移位等。
更新 - 假设您想要大端位打包。这是用于固定大小代码字的代码。它基于我用于数据压缩算法的代码。开关盒和固定逻辑有助于提高性能。
typedef unsigned short uint16_t;
void bit14arr_set(unsigned char* arr, unsigned int index, uint16_t value)
{
unsigned int bitofs = (index*14)%8;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
*arr++ = (unsigned char)(value >> 6);
*arr &= 0x03;
*arr |= (unsigned char)(value << 2);
break;
case 2: /* bit offset == 2 */
*arr &= 0xc0;
*arr++ |= (unsigned char)(value >> 8);
*arr = (unsigned char)(value << 0);
break;
case 4: /* bit offset == 4 */
*arr &= 0xf0;
*arr++ |= (unsigned char)(value >> 10);
*arr++ = (unsigned char)(value >> 2);
*arr &= 0x3f;
*arr |= (unsigned char)(value << 6);
break;
case 6: /* bit offset == 6 */
*arr &= 0xfc;
*arr++ |= (unsigned char)(value >> 12);
*arr++ = (unsigned char)(value >> 4);
*arr &= 0x0f;
*arr |= (unsigned char)(value << 4);
break;
}
}
uint16_t bit14arr_get(unsigned char* arr, unsigned int index)
{
unsigned int bitofs = (index*14)%8;
unsigned short value;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
value = ((unsigned int)(*arr++) ) << 6;
value |= ((unsigned int)(*arr ) ) >> 2;
break;
case 2: /* bit offset == 2 */
value = ((unsigned int)(*arr++)&0x3f) << 8;
value |= ((unsigned int)(*arr ) ) >> 0;
break;
case 4: /* bit offset == 4 */
value = ((unsigned int)(*arr++)&0x0f) << 10;
value |= ((unsigned int)(*arr++) ) << 2;
value |= ((unsigned int)(*arr ) ) >> 6;
break;
case 6: /* bit offset == 6 */
value = ((unsigned int)(*arr++)&0x03) << 12;
value |= ((unsigned int)(*arr++) ) << 4;
value |= ((unsigned int)(*arr ) ) >> 4;
break;
}
return value;
}
这是我的版本(已更新以修复错误):
#define PACKWID 14 // number of bits in packed number
#define PACKMSK ((1 << PACKWID) - 1)
#ifndef ARCHBYTEALIGN
#define ARCHBYTEALIGN 1 // align to 1=bytes, 2=words
#endif
#define ARCHBITALIGN (ARCHBYTEALIGN * 8)
typedef unsigned char byte;
typedef unsigned short u16;
typedef unsigned int u32;
typedef long long s64;
typedef u16 pcknum_t; // container for packed number
typedef u32 acc_t; // working accumulator
#ifndef ARYOFF
#define ARYOFF long
#endif
#define PRT(_val) ((unsigned long) _val)
typedef unsigned ARYOFF aryoff_t; // bit offset
// packary -- access array of packed numbers
// RETURNS: old value
extern inline pcknum_t
packary(byte *ary,aryoff_t idx,int setflg,pcknum_t newval)
// ary -- byte array pointer
// idx -- index into array (packed number relative)
// setflg -- 1=set new value, 0=just get old value
// newval -- new value to set (if setflg set)
{
aryoff_t absbitoff;
aryoff_t bytoff;
aryoff_t absbitlhs;
acc_t acc;
acc_t nval;
int shf;
acc_t curmsk;
pcknum_t oldval;
// get the absolute bit number for the given array index
absbitoff = idx * PACKWID;
// get the byte offset of the lowest byte containing the number
bytoff = absbitoff / ARCHBITALIGN;
// get absolute bit offset of first containing byte
absbitlhs = bytoff * ARCHBITALIGN;
// get amount we need to shift things by:
// (1) our accumulator
// (2) values to set/get
shf = absbitoff - absbitlhs;
#ifdef MODSHOW
do {
static int modshow;
if (modshow > 50)
break;
++modshow;
printf("packary: MODSHOW idx=%ld shf=%d bytoff=%ld absbitlhs=%ld absbitoff=%ld\n",
PRT(idx),shf,PRT(bytoff),PRT(absbitlhs),PRT(absbitoff));
} while (0);
#endif
// adjust array pointer to the portion we want (guaranteed to span)
ary += bytoff * ARCHBYTEALIGN;
// fetch the number + some other bits
acc = *(acc_t *) ary;
// get the old value
oldval = (acc >> shf) & PACKMSK;
// set the new value
if (setflg) {
// get shifted mask for packed number
curmsk = PACKMSK << shf;
// remove the old value
acc &= ~curmsk;
// ensure caller doesn't pass us a bad value
nval = newval;
#if 0
nval &= PACKMSK;
#endif
nval <<= shf;
// add in the value
acc |= nval;
*(acc_t *) ary = acc;
}
return oldval;
}
pcknum_t
int_get(byte *ary,aryoff_t idx)
{
return packary(ary,idx,0,0);
}
void
int_set(byte *ary,aryoff_t idx,pcknum_t newval)
{
packary(ary,idx,1,newval);
}
以下是基准测试:
set: 354740751 7.095 -- gene
set: 203407176 4.068 -- rcgldr
set: 298946533 5.979 -- craig
get: 268574627 5.371 -- gene
get: 166839767 3.337 -- rcgldr
get: 207764612 4.155 -- craig
嗯,这是最好的摆弄。用字节数组做它比用更大的元素更复杂,因为单个 14 位数量可以跨越 3 个字节,其中 uint16_t 或任何更大的东西都需要不超过两个。但我会相信你的话,这就是你想要的(没有双关语意)。这段代码实际上可以将常量设置为任何 8 或更大的值(但不能超过 int
的大小;为此,需要额外的类型转换)。当然如果大于16就要调整值类型
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define W 14
uint16_t arr_get(unsigned char* arr, size_t index) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
uint16_t result = arr[byte_index] >> bit_in_byte_index;
for (unsigned n_bits = 8 - bit_in_byte_index; n_bits < W; n_bits += 8)
result |= arr[++byte_index] << n_bits;
return result & ~(~0u << W);
}
void arr_set(unsigned char* arr, size_t index, uint16_t value) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
arr[byte_index] &= ~(0xff << bit_in_byte_index);
arr[byte_index++] |= value << bit_in_byte_index;
unsigned n_bits = 8 - bit_in_byte_index;
value >>= n_bits;
while (n_bits < W - 8) {
arr[byte_index++] = value;
value >>= 8;
n_bits += 8;
}
arr[byte_index] &= 0xff << (W - n_bits);
arr[byte_index] |= value;
}
int main(void) {
int mod = 1 << W;
int n = 50000;
unsigned x[n];
unsigned char b[2 * n];
for (int tries = 0; tries < 10000; tries++) {
for (int i = 0; i < n; i++) {
x[i] = rand() % mod;
arr_set(b, i, x[i]);
}
for (int i = 0; i < n; i++)
if (arr_get(b, i) != x[i])
printf("Err @%d: %d should be %d\n", i, arr_get(b, i), x[i]);
}
return 0;
}
更快的版本 因为你在评论中说性能是一个问题:开放编码循环在我的机器上使用包含在原来的。这包括随机数生成和测试,因此基元可能快 20%。我相信 16 位或 32 位数组元素会带来进一步的改进,因为字节访问很昂贵:
uint16_t arr_get(unsigned char* a, size_t i) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
return (a[iy] | (a[iy+1] << 8)) & 0x3fff;
case 2:
return ((a[iy] >> 2) | (a[iy+1] << 6)) & 0x3fff;
case 4:
return ((a[iy] >> 4) | (a[iy+1] << 4) | (a[iy+2] << 12)) & 0x3fff;
}
return ((a[iy] >> 6) | (a[iy+1] << 2) | (a[iy+2] << 10)) & 0x3fff;
}
#define M(IB) (~0u << (IB))
#define SETLO(IY, IB, V) a[IY] = (a[IY] & M(IB)) | ((V) >> (14 - (IB)))
#define SETHI(IY, IB, V) a[IY] = (a[IY] & ~M(IB)) | ((V) << (IB))
void arr_set(unsigned char* a, size_t i, uint16_t val) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
a[iy] = val;
SETLO(iy+1, 6, val);
return;
case 2:
SETHI(iy, 2, val);
a[iy+1] = val >> 6;
return;
case 4:
SETHI(iy, 4, val);
a[iy+1] = val >> 4;
SETLO(iy+2, 2, val);
return;
}
SETHI(iy, 6, val);
a[iy+1] = val >> 2;
SETLO(iy+2, 4, val);
}
另一种变体
这在我的机器上要快得多,比上面的好大约 20%:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> (ib % 8)) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
unsigned io = ib % 8;
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
请注意,为了使此代码安全,您应该在打包数组的末尾分配一个额外的字节。它总是读取和写入 3 个字节,即使所需的 14 位在前 2 个中也是如此。
另一个变体 最后,它的运行速度比上面的慢一点(同样在我的机器上;YMMV),但您不需要额外的字节。每个操作使用一个比较:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
unsigned buf = ib % 8 <= 2
? a[iy] | (a[iy+1] << 8)
: a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> io) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
if (io <= 2) {
unsigned buf = a[iy] | (a[iy+1] << 8);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
} else {
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
}
最简单的解决方案是使用 struct
八个位域:
typedef struct __attribute__((__packed__)) EightValues {
uint16_t v0 : 14,
v1 : 14,
v2 : 14,
v3 : 14,
v4 : 14,
v5 : 14,
v6 : 14,
v7 : 14;
} EightValues;
这个结构体的大小为14*8 = 112
位,即14字节(7个uint16_t
)。现在,您只需要使用数组索引的最后三位 select 正确的位域:
uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) {
EightValues* accessPointer = (EightValues*)arr;
accessPointer += index >> 3; //select the right structure in the array
switch(index & 7) { //use the last three bits of the index to access the right bitfield
case 0: return accessPointer->v0;
case 1: return accessPointer->v1;
case 2: return accessPointer->v2;
case 3: return accessPointer->v3;
case 4: return accessPointer->v4;
case 5: return accessPointer->v5;
case 6: return accessPointer->v6;
case 7: return accessPointer->v7;
}
}
您的编译器会为您完成位操作。
在 C 中任意大小的字节数组中,我想存储 14 位数字 (0-16,383) 紧密打包。换句话说,在序列中:
0000000000000100000000000001
有两个数字我希望能够任意存储和检索为 16 位整数。 (在这种情况下,它们都是 1,但可以是给定范围内的任何值)如果我有函数 uint16_t 14bitarr_get(unsigned char* arr, unsigned int index)
和 void 14bitarr_set(unsigned char* arr, unsigned int index, uint16_t value)
,我将如何实现这些函数?
这不是作业项目,只是我自己的好奇心。我有一个特定的项目将用于此,它是整个项目的 key/center。
我不想要一个包含 14 位值的结构数组,因为这会为存储的每个结构生成浪费位。我希望能够将尽可能多的 14 位值紧密打包到一个字节数组中。 (例如:在我发表的评论中,将尽可能多的 14 位值放入 64 字节的块中是可取的,没有浪费位。对于特定用例,这些 64 字节的工作方式完全紧凑,这样即使一点点的浪费就会失去存储另一个 14 位值的能力)
存储问题的基础
你面临的最大问题是“我的存储基础是什么?”这个基本问题你知道基础知识,你可以使用的是char
、short
、int
等...最小的是 8-bits
。无论你如何切片你的存储方案,它最终都必须基于这个每字节 8 位的布局以一个内存单元驻留在内存中。
唯一最佳的、不浪费位的内存分配是声明一个 char 数组,它是 14 位的最小公倍数。在这种情况下是完整的 112-bits
(7-shorts
或 14-chars
)。这可能是最好的选择。在这里,声明一个 7-shorts 或 14-chars 的数组,将允许精确存储 8 14-bit
个值。当然,如果你不需要其中的 8 个,那么它就没有多大用处,因为它浪费的不仅仅是单个无符号值上丢失的 4 位。
如果您想进一步探索,请告诉我。如果是,我很乐意帮助实施。
位域结构
关于bitfield packing或bit packing的评论正是你需要做的。这可能涉及单独的结构或与联合的组合,或者根据需要手动 right/left 直接移动值。
一个适用于您的情况的简短示例(如果我理解正确的话,您需要内存中的 2 个 14 位区域)是:
#include <stdio.h>
typedef struct bitarr14 {
unsigned n1 : 14,
n2 : 14;
} bitarr14;
char *binstr (unsigned long n, size_t sz);
int main (void) {
bitarr14 mybitfield;
mybitfield.n1 = 1;
mybitfield.n2 = 1;
printf ("\n mybitfield in memory : %s\n\n",
binstr (*(unsigned *)&mybitfield, 28));
return 0;
}
char *binstr (unsigned long n, size_t sz)
{
static char s[64 + 1] = {0};
char *p = s + 64;
register size_t i = 0;
for (i = 0; i < sz; i++) {
p--;
*p = (n >> i & 1) ? '1' : '0';
}
return p;
}
输出
$ ./bin/bitfield14
mybitfield in memory : 0000000000000100000000000001
注意: 取消引用 mybitfield
以打印内存中的值 打破了严格的别名 并且它是有意的仅用于输出示例。
以提供的方式使用结构的好处和目的在于它允许直接访问结构的每个 14 位部分,而无需手动移位等。
更新 - 假设您想要大端位打包。这是用于固定大小代码字的代码。它基于我用于数据压缩算法的代码。开关盒和固定逻辑有助于提高性能。
typedef unsigned short uint16_t;
void bit14arr_set(unsigned char* arr, unsigned int index, uint16_t value)
{
unsigned int bitofs = (index*14)%8;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
*arr++ = (unsigned char)(value >> 6);
*arr &= 0x03;
*arr |= (unsigned char)(value << 2);
break;
case 2: /* bit offset == 2 */
*arr &= 0xc0;
*arr++ |= (unsigned char)(value >> 8);
*arr = (unsigned char)(value << 0);
break;
case 4: /* bit offset == 4 */
*arr &= 0xf0;
*arr++ |= (unsigned char)(value >> 10);
*arr++ = (unsigned char)(value >> 2);
*arr &= 0x3f;
*arr |= (unsigned char)(value << 6);
break;
case 6: /* bit offset == 6 */
*arr &= 0xfc;
*arr++ |= (unsigned char)(value >> 12);
*arr++ = (unsigned char)(value >> 4);
*arr &= 0x0f;
*arr |= (unsigned char)(value << 4);
break;
}
}
uint16_t bit14arr_get(unsigned char* arr, unsigned int index)
{
unsigned int bitofs = (index*14)%8;
unsigned short value;
arr += (index*14)/8;
switch(bitofs){
case 0: /* bit offset == 0 */
value = ((unsigned int)(*arr++) ) << 6;
value |= ((unsigned int)(*arr ) ) >> 2;
break;
case 2: /* bit offset == 2 */
value = ((unsigned int)(*arr++)&0x3f) << 8;
value |= ((unsigned int)(*arr ) ) >> 0;
break;
case 4: /* bit offset == 4 */
value = ((unsigned int)(*arr++)&0x0f) << 10;
value |= ((unsigned int)(*arr++) ) << 2;
value |= ((unsigned int)(*arr ) ) >> 6;
break;
case 6: /* bit offset == 6 */
value = ((unsigned int)(*arr++)&0x03) << 12;
value |= ((unsigned int)(*arr++) ) << 4;
value |= ((unsigned int)(*arr ) ) >> 4;
break;
}
return value;
}
这是我的版本(已更新以修复错误):
#define PACKWID 14 // number of bits in packed number
#define PACKMSK ((1 << PACKWID) - 1)
#ifndef ARCHBYTEALIGN
#define ARCHBYTEALIGN 1 // align to 1=bytes, 2=words
#endif
#define ARCHBITALIGN (ARCHBYTEALIGN * 8)
typedef unsigned char byte;
typedef unsigned short u16;
typedef unsigned int u32;
typedef long long s64;
typedef u16 pcknum_t; // container for packed number
typedef u32 acc_t; // working accumulator
#ifndef ARYOFF
#define ARYOFF long
#endif
#define PRT(_val) ((unsigned long) _val)
typedef unsigned ARYOFF aryoff_t; // bit offset
// packary -- access array of packed numbers
// RETURNS: old value
extern inline pcknum_t
packary(byte *ary,aryoff_t idx,int setflg,pcknum_t newval)
// ary -- byte array pointer
// idx -- index into array (packed number relative)
// setflg -- 1=set new value, 0=just get old value
// newval -- new value to set (if setflg set)
{
aryoff_t absbitoff;
aryoff_t bytoff;
aryoff_t absbitlhs;
acc_t acc;
acc_t nval;
int shf;
acc_t curmsk;
pcknum_t oldval;
// get the absolute bit number for the given array index
absbitoff = idx * PACKWID;
// get the byte offset of the lowest byte containing the number
bytoff = absbitoff / ARCHBITALIGN;
// get absolute bit offset of first containing byte
absbitlhs = bytoff * ARCHBITALIGN;
// get amount we need to shift things by:
// (1) our accumulator
// (2) values to set/get
shf = absbitoff - absbitlhs;
#ifdef MODSHOW
do {
static int modshow;
if (modshow > 50)
break;
++modshow;
printf("packary: MODSHOW idx=%ld shf=%d bytoff=%ld absbitlhs=%ld absbitoff=%ld\n",
PRT(idx),shf,PRT(bytoff),PRT(absbitlhs),PRT(absbitoff));
} while (0);
#endif
// adjust array pointer to the portion we want (guaranteed to span)
ary += bytoff * ARCHBYTEALIGN;
// fetch the number + some other bits
acc = *(acc_t *) ary;
// get the old value
oldval = (acc >> shf) & PACKMSK;
// set the new value
if (setflg) {
// get shifted mask for packed number
curmsk = PACKMSK << shf;
// remove the old value
acc &= ~curmsk;
// ensure caller doesn't pass us a bad value
nval = newval;
#if 0
nval &= PACKMSK;
#endif
nval <<= shf;
// add in the value
acc |= nval;
*(acc_t *) ary = acc;
}
return oldval;
}
pcknum_t
int_get(byte *ary,aryoff_t idx)
{
return packary(ary,idx,0,0);
}
void
int_set(byte *ary,aryoff_t idx,pcknum_t newval)
{
packary(ary,idx,1,newval);
}
以下是基准测试:
set: 354740751 7.095 -- gene set: 203407176 4.068 -- rcgldr set: 298946533 5.979 -- craig get: 268574627 5.371 -- gene get: 166839767 3.337 -- rcgldr get: 207764612 4.155 -- craig
嗯,这是最好的摆弄。用字节数组做它比用更大的元素更复杂,因为单个 14 位数量可以跨越 3 个字节,其中 uint16_t 或任何更大的东西都需要不超过两个。但我会相信你的话,这就是你想要的(没有双关语意)。这段代码实际上可以将常量设置为任何 8 或更大的值(但不能超过 int
的大小;为此,需要额外的类型转换)。当然如果大于16就要调整值类型
#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#define W 14
uint16_t arr_get(unsigned char* arr, size_t index) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
uint16_t result = arr[byte_index] >> bit_in_byte_index;
for (unsigned n_bits = 8 - bit_in_byte_index; n_bits < W; n_bits += 8)
result |= arr[++byte_index] << n_bits;
return result & ~(~0u << W);
}
void arr_set(unsigned char* arr, size_t index, uint16_t value) {
size_t bit_index = W * index;
size_t byte_index = bit_index / 8;
unsigned bit_in_byte_index = bit_index % 8;
arr[byte_index] &= ~(0xff << bit_in_byte_index);
arr[byte_index++] |= value << bit_in_byte_index;
unsigned n_bits = 8 - bit_in_byte_index;
value >>= n_bits;
while (n_bits < W - 8) {
arr[byte_index++] = value;
value >>= 8;
n_bits += 8;
}
arr[byte_index] &= 0xff << (W - n_bits);
arr[byte_index] |= value;
}
int main(void) {
int mod = 1 << W;
int n = 50000;
unsigned x[n];
unsigned char b[2 * n];
for (int tries = 0; tries < 10000; tries++) {
for (int i = 0; i < n; i++) {
x[i] = rand() % mod;
arr_set(b, i, x[i]);
}
for (int i = 0; i < n; i++)
if (arr_get(b, i) != x[i])
printf("Err @%d: %d should be %d\n", i, arr_get(b, i), x[i]);
}
return 0;
}
更快的版本 因为你在评论中说性能是一个问题:开放编码循环在我的机器上使用包含在原来的。这包括随机数生成和测试,因此基元可能快 20%。我相信 16 位或 32 位数组元素会带来进一步的改进,因为字节访问很昂贵:
uint16_t arr_get(unsigned char* a, size_t i) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
return (a[iy] | (a[iy+1] << 8)) & 0x3fff;
case 2:
return ((a[iy] >> 2) | (a[iy+1] << 6)) & 0x3fff;
case 4:
return ((a[iy] >> 4) | (a[iy+1] << 4) | (a[iy+2] << 12)) & 0x3fff;
}
return ((a[iy] >> 6) | (a[iy+1] << 2) | (a[iy+2] << 10)) & 0x3fff;
}
#define M(IB) (~0u << (IB))
#define SETLO(IY, IB, V) a[IY] = (a[IY] & M(IB)) | ((V) >> (14 - (IB)))
#define SETHI(IY, IB, V) a[IY] = (a[IY] & ~M(IB)) | ((V) << (IB))
void arr_set(unsigned char* a, size_t i, uint16_t val) {
size_t ib = 14 * i;
size_t iy = ib / 8;
switch (ib % 8) {
case 0:
a[iy] = val;
SETLO(iy+1, 6, val);
return;
case 2:
SETHI(iy, 2, val);
a[iy+1] = val >> 6;
return;
case 4:
SETHI(iy, 4, val);
a[iy+1] = val >> 4;
SETLO(iy+2, 2, val);
return;
}
SETHI(iy, 6, val);
a[iy+1] = val >> 2;
SETLO(iy+2, 4, val);
}
另一种变体 这在我的机器上要快得多,比上面的好大约 20%:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> (ib % 8)) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
unsigned io = ib % 8;
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
请注意,为了使此代码安全,您应该在打包数组的末尾分配一个额外的字节。它总是读取和写入 3 个字节,即使所需的 14 位在前 2 个中也是如此。
另一个变体 最后,它的运行速度比上面的慢一点(同样在我的机器上;YMMV),但您不需要额外的字节。每个操作使用一个比较:
uint16_t arr_get2(unsigned char* a, size_t i) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
unsigned buf = ib % 8 <= 2
? a[iy] | (a[iy+1] << 8)
: a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
return (buf >> io) & 0x3fff;
}
void arr_set2(unsigned char* a, size_t i, unsigned val) {
size_t ib = i * 14;
size_t iy = ib / 8;
unsigned io = ib % 8;
if (io <= 2) {
unsigned buf = a[iy] | (a[iy+1] << 8);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
} else {
unsigned buf = a[iy] | (a[iy+1] << 8) | (a[iy+2] << 16);
buf = (buf & ~(0x3fff << io)) | (val << io);
a[iy] = buf;
a[iy+1] = buf >> 8;
a[iy+2] = buf >> 16;
}
}
最简单的解决方案是使用 struct
八个位域:
typedef struct __attribute__((__packed__)) EightValues {
uint16_t v0 : 14,
v1 : 14,
v2 : 14,
v3 : 14,
v4 : 14,
v5 : 14,
v6 : 14,
v7 : 14;
} EightValues;
这个结构体的大小为14*8 = 112
位,即14字节(7个uint16_t
)。现在,您只需要使用数组索引的最后三位 select 正确的位域:
uint16_t 14bitarr_get(unsigned char* arr, unsigned int index) {
EightValues* accessPointer = (EightValues*)arr;
accessPointer += index >> 3; //select the right structure in the array
switch(index & 7) { //use the last three bits of the index to access the right bitfield
case 0: return accessPointer->v0;
case 1: return accessPointer->v1;
case 2: return accessPointer->v2;
case 3: return accessPointer->v3;
case 4: return accessPointer->v4;
case 5: return accessPointer->v5;
case 6: return accessPointer->v6;
case 7: return accessPointer->v7;
}
}
您的编译器会为您完成位操作。