C++ 添加非常大的 unsigned long 和 double

Question

以下代码：

#include <iostream>
#include <limits>

int main(int argc, char **argv) {
  unsigned long n = 10ul;
  unsigned long ul = std::numeric_limits<unsigned long>::max() - n;
  double d = 1.;
  ul += d;
  std::cout << ul << std::endl;
}

人们可能期望输出是 std::numeric_limits<unsigned long>::max() - 9。但是，对于所有值 n < 1024，此代码的输出是 0。为什么？

到目前为止我想了解的一些观察：

我们没有超过 std::numeric_limits<unsigned long>::max()，所以不会发生 ul 的溢出（从数学上讲）。
在添加时转换 d 会得到 ul 的预期值（将第 7 行更改为 ul += static_cast<unsigned long>(d);）
我猜会发生什么：
- ul += d 解析为 ul = (double)ul + d
- 此加法作为 64 位浮点运算执行
- 结果值不能用double精确表示，结果是std::numeric_limits<unsigned long>::max() + 1。
- 然后将此结果转换回 unsigned long，overflows/wraps 大约为 0。

编辑

一些测试似乎支持我上面的猜测。

double x = std::numeric_limits<unsigned long>::max() 导致 x 保持值 std::numeric_limits<unsigned long>::max() + 1.
是的，我的 unsigned long 是 64 位的。
问题不在于为什么 double 不准确。我理解浮点数的概念。问题是 C++ 用于评估导致这种不幸结果的数据格式的表达式的确切规则是什么。

Answer 1

The question is what are the exact rules for C++ for evaluating an expression in which data format that lead to this unfortunate result.

让我们检查一下线：

ul += d;

其中 d 的类型为 double，ul 的类型为 unsigned long。

来自 7.6.19 Assignment and compound assignment operators :

The behavior of an expression of the form E1 op= E2 is equivalent to E1 = E1 op E2 except that E1 is evaluated only once

所以 ul += d 等于 ul = ul + d.

来自 7.6.6 Additive operators :

The additive operators + and - group left-to-right. The usual arithmetic conversions are performed for operands of arithmetic or enumeration type.

所以 ul 和 d 都在 ul + d 中提升了。

来自 7.4 Usual arithmetic conversions :

[...] This pattern is called the usual arithmetic conversions, which are defined as follows:

[...]

Otherwise, if either operand is double, the other shall be converted to double.

[...]

因此 ul 在 ul + d 中转换为 double。

来自 7.3.11 Floating-integral conversions 强调我的：

A prvalue of an integer type or of an unscoped enumeration type can be converted to a prvalue of a floating-point type. The result is exact if possible. If the value being converted is in the range of values that can be represented but the value cannot be represented exactly, it is an implementation-defined choice of either the next lower or higher representable value.

If the value being converted is outside the range of values that can be represented, the behavior is undefined.

因此，如果 ul 的值不能准确地用双精度表示，则它是实现定义的。

然后，在计算之后，double结果在赋值给ul时被转换回unsigned long，所以也来自Floating-integral conversions强调我的：

A prvalue of a floating-point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type.

The output of this code is 0 for all values n < 1024. Why?

Gcc 编译器记录在将浮点数转换为整数并返回时遵循 C99 附件 F，请参阅 gcc11.1.0 docs implementation defined beavior 4.6 Floating point, but I see the result in C99 Annex F is unspecified, but a floating point exception is required to be raised. The following code with function copied from cppreference feexceptflag

#include <iostream>
#include <limits>
#include <cfenv>

void show_fe_exceptions(void)
{
    printf("current exceptions raised: ");
    if(fetestexcept(FE_DIVBYZERO))     printf(" FE_DIVBYZERO");
    if(fetestexcept(FE_INEXACT))       printf(" FE_INEXACT");
    if(fetestexcept(FE_INVALID))       printf(" FE_INVALID");
    if(fetestexcept(FE_OVERFLOW))      printf(" FE_OVERFLOW");
    if(fetestexcept(FE_UNDERFLOW))     printf(" FE_UNDERFLOW");
    if(fetestexcept(FE_ALL_EXCEPT)==0) printf(" none");
    printf("\n");
}

int main(int argc, char **argv) {
  unsigned long n = 10ul;
  unsigned long ul = std::numeric_limits<unsigned long>::max() - n;
  double d = 1.;
  show_fe_exceptions();
  ul += d;
  show_fe_exceptions();
  std::cout << ul << std::endl;
}

outputs on godbolt 并确认引发异常：

current exceptions raised:  none
current exceptions raised:  FE_INEXACT FE_INVALID
0

C++ 添加非常大的 unsigned long 和 double

C++ Addition of very large unsigned long and double

c++

integer-overflow