按位运算符和符号类型

Question

我正在阅读 C++ Primer，我对一些谈论按位运算符如何处理有符号类型的评论感到有点困惑。我会引用：

引用#1

(When talking about Bitwise operators) "If the operand is signed and its value is negative, then the way that the “sign bit” is handled in a number of the bitwise operations is machine dependent. Moreover, doing a left shift that changes the value of the sign bit is undefined"

引用 #2

(When talking about the rightshift operator) "If that operand is unsigned, then the operator inserts 0-valued bits on the left; if it is a signed type, the result is implementation defined—either copies of the sign bit or 0-valued bits are inserted on the left."

按位运算符将小整数（例如 char）提升为有符号整数。当按位运算符经常在有符号运算符类型上给出未定义或实现定义的行为时，对 signed 整数的提升是否存在问题？为什么标准不将 char 提升为 unsigned int？

编辑：这是我删除的问题，但我已将其放回原处以供参考，并在下面提供了一些答案。

后面的练习会问

"What is the value of ~'q' << 6 on a machine with 32-bit ints and 8 bit chars, that uses Latin-1 character set in which 'q' has the bit pattern 01110001?"

好吧，'q' 是一个字符字面量，将被提升为 int，给出

~'q' == ~0000000 00000000 00000000 01110001 == 11111111 11111111 11111111 10001110

下一步是对上面的位应用左移运算符，但是正如引用#1提到的

"doing a left shift that changes the value of the sign bit is undefined"

好吧，我不太清楚符号位是哪一位，但答案肯定是未定义的？

Answer 1

当然，通过编辑问题，我的回答现在部分地回答了与提出的问题不同的问题，所以这里尝试回答 "new" 问题：

标准中明确定义了升级规则（转换为什么）。类型 char 可能是 signed 或 unsigned - 在某些编译器中，您甚至可以给编译器一个标志，使其显示 "I want unsigned char type" 或 "I want signed char type" - 但大多数编译器只是将 char 定义为 signed 或 unsigned。

一个常量，例如6，默认是有符号的。当在代码中写入诸如 'q' << 6 之类的操作时，编译器会将任何较小的类型转换为任何较大的类型 [或者如果您进行任何一般的算术运算，char 将转换为 int]，所以'q'成为'q'的整数值。如果您想避免这种情况，您应该使用 6u 或显式强制转换，例如 static_cast<unsigned>('q') << 6 - 这样，您就可以确保将操作数转换为无符号而不是有符号。

操作是未定义的，因为不同的硬件行为不同，并且存在带有 "strange" 编号系统的体系结构，这意味着标准委员会必须在 "ruling out/making operations extremely inefficient" 或 "defining the standard in a way that isn't very clear" 之间做出选择。在一些架构中，溢出的整数也可能是一个陷阱，如果你改变数字上的符号，这通常算作溢出 - 因为陷阱通常意味着 "your code no longer runs"，那不是什么您的普通程序员期望 -> 属于 "undefined behaviour" 的范畴。大多数处理器不这样做，如果您这样做，就不会发生什么真正糟糕的事情。

旧答案：因此，避免这种情况的解决方案是在移动之前始终将有符号值（包括 char）转换为无符号（或者接受您的代码可能无法在另一个编译器、具有不同选项的同一编译器或下一个版本上运行的事实相同的编译器）。

还值得注意的是，结果值为 "nearly always what you expect"（因为 compiler/processor 将仅对值执行左移或右移，右移使用符号位进行移位down)，它只是未定义或实现已定义，因为某些机器体系结构可能没有 "do this right" 的硬件，并且 C 编译器仍然需要在这些系统上工作。

符号位是二进制补码中的最高位，您不会通过移动该数字来改变它：

       11111111 11111111 11111111 10001110 << 6 =
111111 11111111 11111111 11100011 10000000
^^^^^^--- goes away.
result=11111111 11111111 11100011 10000000

或十六进制数：0xfffffe380。

Answer 2

您说得很对——根据标准，表达式 ~'q' << 6 是未定义的行为。它比你说的更糟糕，因为 ~ 运算符被定义为计算值的 "The one's complement"，这对于带符号的（2s-补码）整数是没有意义的 - 术语 "one's complement" 只有对于无符号整数真的意味着什么。

在进行位运算时，如果你想要严格定义（按照标准）的结果，你通常必须确保被运算的值是无符号的。您可以通过显式强制转换或在二元运算中使用显式无符号常量（U-后缀）来实现。使用带符号和无符号的 int 进行二进制运算是作为无符号完成的（有符号值转换为无符号）。

C 和 C++ 在整数提升方面略有不同，因此您在这里需要小心 -- C++ 会将小于 int 的无符号值转换为 int（有符号），然后再与其他操作数进行比较以查看结果应该完成，而 C 将首先比较操作数。

Answer 3

阅读标准的确切文本可能比 Primer Plus 中的摘要更简单。（总结是总结，必须省略细节！）

相关部分是：

[expr.shift]

The shift operators << and >> group left-to-right. The operands shall be of integral or unscoped enumeration type and integral promotions are performed. The type of the result is that of the promoted left operand. The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

The value of E1 << E2 is E1 left-shifted E2 bit positions; vacated bits are zero-filled. If E1 has an unsigned type, the value of the result is E1 × 2^E2 , reduced modulo one more than the maximum value representable in the result type. Otherwise, if E1 has a signed type and non-negative value, and E1 × 2^E2 is representable in the corresponding unsigned type of the result type, then that value, converted to the result type, is the resulting value; otherwise, the behavior is undefined.

[expr.unary.op]/10

The operand of ˜ shall have integral or unscoped enumeration type; the result is the one’s complement of its operand. Integral promotions are performed. The type of the result is the type of the promoted operand.

请注意，这些都不执行 通常的算术转换（这是大多数二元运算符完成的到通用类型的转换）。

积分优惠：

[conv.prom]/1

A prvalue of an integer type other than bool, char16_t, char32_t, or wchar_t whose integer conversion rank is less than the rank of int can be converted to a prvalue of type int if int can represent all the values of the source type; otherwise, the source prvalue can be converted to a prvalue of type unsigned int.

（"other than" 列表中还有其他类型的条目，我在这里省略了它们，但您可以在标准草案中查找）。

关于整数提升要记住的是它们是保值，如果你有 char 的价值 -30，那么在促销它将是价值 -30 的 int。你不需要考虑 "sign extension".

这样的事情

你对~'q'的初步分析是正确的，结果的类型是int（因为int在正常系统上可以表示char的所有值）。

事实证明，任何设置了最高有效位的int都代表一个负值（标准的另一部分对此有规定，我没有在这里引用），所以~'q' 是负数 int.

查看 [[=67=]]/2，我们发现这意味着左移会导致未定义的行为（该段落中的任何早期案例均未涵盖）。

按位运算符和符号类型

Bitwise operators and signed types

c++

signed

bit-manipulation

language-lawyer