C 在表达式中间使用不同的数据类型进行算术运算？

Question

在Go（我最熟悉的语言）中，数学运算的结果总是与操作数相同的数据类型，这意味着如果运算溢出，结果将不正确。例如：

func main() {
    var a byte = 100
    var b byte = 9
    var r byte = (a << b) >> b
    fmt.Println(r)
}

这会打印 0，因为在初始 << 9 操作期间所有位都移出 byte 的边界，然后在 >> 9 操作期间将零移回.

然而，在 C:

中情况并非如此

int main() {
    unsigned char a = 100;
    unsigned char b = 9;
    unsigned char r = (a << b) >> b;
    printf("%d\n", r);
    return 0;
}

此代码打印 100。虽然这会产生“正确”的结果，但这对我来说是出乎意料的，因为我只希望在其中一个操作数大于一个字节时进行提升，但在这种情况下所有操作数都是字节。就好像保存 << 9 操作结果的临时变量大于结果变量，并且仅在评估完整的 RHS 之后才向下转换回一个字节，因此在 >> 9 操作恢复之后位。

显然，如果在继续之前显式地将 >> 9 的结果存储到一个字节中，您会得到与 Go 中相同的结果：

int main() {
    unsigned char a = 100;
    unsigned char b = 9;
    unsigned char c = a << b;
    unsigned char r = c >> b;
    printf("%d\n", r);
    return 0;
}

这不仅仅是按位运算符的情况。我也用 multiplication/division 测试过，它表现出相同的行为。

我的问题是：是否定义了 C 的这种行为？如果有，在哪里？它实际上是否将特定数据类型用于复杂表达式的中间值？或者这实际上是未定义的行为，就像在保存回内存之前在 32/64 位 CPU 寄存器中执行操作的偶然结果？

Answer 1

欢迎使用整数促销！ C 语言的一种行为（一个经常被批评的行为，我补充说）是像 char 和 short 这样的类型被提升为 int before 对它们进行任何算术运算，结果也是int。这是什么意思？

unsigned char foo(unsigned char x) {
  return (x << 4) >> 4;
}

int main(void) {
  if (foo(0xFF) == 0x0F) {
    printf("Yay!\n");
  }
  else {
    printf("... hey, wait a minute!\n");
  }

  return 0;
}

不用说了，上面的代码打印了... hey, wait a minute!。让我们找出原因：

// this line of code:
return (x << 4) >> 4;

// is converted to this (because of integer promotion):
return ((int) x << 4) >> 4;

因此，情况是这样的：

x是unsigned char（8位），它的值是0xFF，
x << 4需要执行，但是先把x转换成int（32位），
x << 4变成0x000000FF << 4，结果0x00000FF0也是int，
0x00000FF0 >> 4 被执行，产生 0x000000FF,
最后，0x000000FF转换为unsigned char（因为那是foo()的return值），所以变成了0xFF，
这就是 foo(0xFF) 产生 0xFF 而不是 0x0F 的原因。

如何预防？简单：将x << 4的结果转换为unsigned char。在前面的示例中，0x00000FF0 会变成 0xF0。

unsigned char foo(unsigned char x) {
  return ((unsigned char) (x << 4)) >> 4;
}

foo(0xFF) == 0x0F

注意：在前面的示例中，假设 unsigned char 是 8 位，int 是 32 位，但是这些示例基本上适用于 CHAR_BIT == 8 (因为 C17 要求 sizeof(int) * CHAR_BIT >= 16).

P.S.: 当然，这个答案并不像 C 官方标准文档那样详尽。但是您可以找到 latest draft of the ISO/IEC 9899:2018 standard (a.k.a. C17/C18).

中描述的 C 的所有（有效和定义的）行为

Answer 2

C 2018 6.5.7 讨论了移位运算符。第 3 段说：

The integer promotions are performed on each of the operands…

6.3.1.1 2指定整数促销：

… If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

因此在a << b中a和b是unsigned char，a被提升为int，至少是16位。（C 实现可能将 unsigned char 定义为多于八位。它可能与 int 具有相同的宽度。在这种情况下，整数提升不会转换 a 或 b.)

请注意，如果不应用整数提升，则计算 a << b 且 b 等于 9 的行为不会由 C 标准定义，因为移位运算符的行为是未为大于或等于左侧运算符宽度的移位量定义。

6.5.5 指定了乘法运算符。第 3 段说：

The usual arithmetic conversions are performed on the operands.

6.3.1.8 指定常用的算术转换：

… First, if the corresponding real type of either operand is long double, the other operand is converted, without change of type domain [complex or real], to a type whose corresponding real type is long double.

Otherwise, if the corresponding real type of either operand is double, the other operand is converted, without change of type domain, to a type whose corresponding real type is double.

Otherwise, if the corresponding real type of either operand is float, the other operand is converted, without change of type domain, to a type whose corresponding real type is float.

Otherwise, the integer promotions are performed on both operands. Then the following rules are applied to the promoted operands:

If both operands have the same type, then no further conversion is needed.

Otherwise, if both operands have signed integer types or both have unsigned integer types, the operand with the type of lesser integer conversion rank is converted to the type of the operand with greater rank.

Otherwise, if the operand that has unsigned integer type has rank greater or equal to the rank of the type of the other operand, then the operand with signed integer type is converted to the type of the operand with unsigned integer type.

Otherwise, if the type of the operand with signed integer type can represent all of the values of the type of the operand with unsigned integer type, then the operand with unsigned integer type is converted to the type of the operand with signed integer type.

Otherwise, both operands are converted to the unsigned integer type corresponding to the type of the operand with signed integer type.

Rank 有一个技术定义，主要对应于宽度（整数类型中的位数）。

因此，在 a * b 中 a 和 b 是 unsigned char，它们都被提升为 int（上面关于宽 unsigned char) 并且不需要进一步的转换。如果一个操作数比 int 宽，比如 long long int，而另一个是 unsigned char，那么两个操作数都将转换为更宽的类型。

C 在表达式中间使用不同的数据类型进行算术运算？

C uses different data type for arithmetic in the middle of an expression?

c

math

integer

integer-overflow