C 中正好一个字节的位掩码

Question

我的目标是在四个字节中保存一个 long like this:

unsigned char bytes[4];
unsigned long n = 123;

bytes[0] = (n >> 24) & 0xFF;
bytes[1] = (n >> 16) & 0xFF;
bytes[2] = (n >> 8) & 0xFF;
bytes[3] = n & 0xFF;

但我希望代码可移植，所以我使用 CHAR_BIT from <limits.h>:

unsigned char bytes[4];
unsigned long n = 123;

bytes[0] = (n >> (CHAR_BIT * 3)) & 0xFF;
bytes[1] = (n >> (CHAR_BIT * 2)) & 0xFF;
bytes[2] = (n >> CHAR_BIT) & 0xFF;
bytes[3] = n & 0xFF;

问题是位掩码0xFF只占八位，不一定等同于一个字节。有没有办法让上层代码完全可移植到所有平台？

Answer 1

怎么样：

unsigned long mask = 1;
mask<<=CHAR_BIT;
mask-=1;

然后用这个作为掩码而不是 0xFF?

测试程序：

#include <stdio.h>

int main() {
    #define MY_CHAR_BIT_8 8
    #define MY_CHAR_BIT_9 9
    #define MY_CHAR_BIT_10 10
    #define MY_CHAR_BIT_11 11
    #define MY_CHAR_BIT_12 12
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_8;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_9;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_10;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_11;
        mask-= 1;
        printf("%lx\n", mask);
    }
    {
        unsigned long mask = 1;
        mask<<=MY_CHAR_BIT_12;
        mask-= 1;
        printf("%lx\n", mask);
    }
}

Output:

ff
1ff
3ff
7ff
fff

Answer 2

怎么样：

unsigned long mask = (unsigned char)-1;

这会起作用，因为 C 标准在 6.3.1.3p2

中说

1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type.

而unsigned long可以表示unsigned char的所有值。

Answer 3

#define CHARMASK ((1UL << CHAR_BIT) - 1)

int main(void)
{
    printf("0x%x\n", CHARMASK);
}

并且掩码将始终具有字符的宽度。计算的编译时间，不需要额外的变量。

或者

#define CHARMASK    ((unsigned char)(~0))

不戴口罩也可以

void foo(unsigned int n, unsigned char *bytes)
{
    bytes[0] = ((n << (CHAR_BIT * 0)) >> (CHAR_BIT * 3));
    bytes[1] = ((n << (CHAR_BIT * 1)) >> (CHAR_BIT * 3));
    bytes[2] = ((n << (CHAR_BIT * 2)) >> (CHAR_BIT * 3));
    bytes[3] = ((n << (CHAR_BIT * 3)) >> (CHAR_BIT * 3));
}


int main(void)
{
    unsigned int z = 0xaabbccdd;
    unsigned char bytes[4];
    foo(z, bytes);
    printf("0x%02x 0x%02x 0x%02x 0x%02x\n", bytes[0], bytes[1], bytes[2], bytes[3]);
}

Answer 4

我几乎专门从事嵌入式系统的工作，在这些系统中，我经常不得不在各种或多或少奇异的系统之间提供可移植代码。就像编写可以在一些微型 8 位 MCU 和 x86_64.

上运行的代码一样

但即使对我来说，为异国情调的过时 DSP 系统等的可移植性而烦恼也是巨大的时间浪费。这些系统在现实世界中几乎不存在 - 为什么您需要对它们进行移植？除了“炫耀”大部分无用的 C 语言律师知识外，还有其他原因吗？根据我的经验，所有这些无用的可移植性问题中的 99% 归结为程序员“炫耀”，而不是实际的需求规范。

即使您出于某种奇怪的原因确实需要这种可移植性，这个任务从一开始就没有任何意义，因为 char 和 long 都不可移植！如果 char 不是 8 位那么你为什么认为 long 是 4 字节？它可以是 2 个字节，可以是 8 个字节，也可以是其他的。

如果可移植性是一个实际问题，那么您必须使用 stdint.h。然后，如果您真的必须支持外来系统，则必须决定支持哪些系统。我所知道的唯一真正使用不同字节大小的真实计算机是 1990 年代各种过时的外来 TI DSP，它们使用 16 位 bytes/char。让我们假设这是您决定要支持的重要目标。

我们还假设存在用于该奇异目标的标准 C 编译器 (ISO 9899)，这是极不可能的。（更有可能你会得到一个不符合要求的，大部分都是坏掉的遗留 C90 东西......或者更有可能那些使用目标的人在汇编器中编写所有东西。）在标准 C 编译器的情况下，它不会实现 uint8_t 因为如果目标不支持它，它就不是强制类型。只有 uint_least8_t 和 uint_fast8_t 是强制性的。

然后你会这样做：

#include <stdint.h>
#include <limits.h>
#if CHAR_BIT == 8
static void uint32_to_uint8 (uint8_t dst[4], uint32_t u32)
{
  dst[0] = (u32 >> 24) & 0xFF;
  dst[1] = (u32 >> 16) & 0xFF;
  dst[2] = (u32 >>  8) & 0xFF;
  dst[3] = (u32 >>  0) & 0xFF;
}
#endif 

// whatever other conversion functions you need:
static void uint32_to_uint16 (uint16_t dst[2], uint32_t u32){ ... }
static void uint64_to_uint16 (uint16_t dst[2], uint32_t u32){ ... }

奇异的 DSP 将使用 uint32_to_uint16 函数。您可以使用相同的编译器 #if CHAR_BIT 检查来执行 #define byte_to_word uint32_to_uint16 等

然后也应该立即注意到字节顺序将是下一个主要的可移植性问题。我不知道过时的 DSP 通常使用什么字节顺序，但这是另一个问题。

C 中正好一个字节的位掩码

Bitmask for exactly one byte in C

c

char

byte

bit

bitmask