为什么转换 (unsigned long long)DBL_MAX(或 FLT_MAX)也会引起 FE_INEXACT 的提升?

Why conversion (unsigned long long)DBL_MAX (or FLT_MAX) causes raising of FE_INEXACT as well?

代码(t1.c):

#include <stdio.h>
#include <float.h>
#include <fenv.h>

#if _MSC_VER
#pragma fenv_access (on)
#else
#pragma STDC FENV_ACCESS ON
#endif


void print_fpe()
{
    int fpe = fetestexcept(FE_ALL_EXCEPT);
    printf("current exceptions raised:");
    if (fpe & FE_DIVBYZERO)       printf(" FE_DIVBYZERO");
    if (fpe & FE_INEXACT)         printf(" FE_INEXACT");
    if (fpe & FE_INVALID)         printf(" FE_INVALID");
    if (fpe & FE_OVERFLOW)        printf(" FE_OVERFLOW");
    if (fpe & FE_UNDERFLOW)       printf(" FE_UNDERFLOW");
    if ((fpe & FE_ALL_EXCEPT)==0) printf(" none");
}

volatile double d = DBL_MAX;
volatile float f = FLT_MAX;
volatile signed long long ll;
volatile signed long l;
volatile signed int i;
volatile signed short s;
volatile signed char c;
volatile unsigned long long ull;
volatile unsigned long ul;
volatile unsigned int ui;
volatile unsigned short us;
volatile unsigned char uc;

#define TEST(dst, type, src)         \
    feclearexcept(FE_ALL_EXCEPT);    \
    dst = (type)(src);               \
    print_fpe();                     \
    printf(" line %u\n", __LINE__);

int main(void)
{
    TEST(ll, signed long long, d);
    TEST(l, signed long, d);
    TEST(i, signed int, d);
    TEST(s, signed short, d);
    TEST(c, signed char, d);
    TEST(ll, signed long long, f);
    TEST(l, signed long, f);
    TEST(i, signed int, f);
    TEST(s, signed short, f);
    TEST(c, signed char, f);
    TEST(ull, unsigned long long, d); // line 55
    TEST(ul, unsigned long, d);
    TEST(ui, unsigned int, d);
    TEST(us, unsigned short, d);
    TEST(uc, unsigned char, d);
    TEST(ull, unsigned long long, f); // line 60
    TEST(ul, unsigned long, f);
    TEST(ui, unsigned int, f);
    TEST(us, unsigned short, f);
    TEST(uc, unsigned char, f);
    return 0;
}

调用和结果:

$ cl t1.c && t1
current exceptions raised: FE_INVALID line 45
current exceptions raised: FE_INVALID line 46
current exceptions raised: FE_INVALID line 47
current exceptions raised: FE_INVALID line 48
current exceptions raised: FE_INVALID line 49
current exceptions raised: FE_INVALID line 50
current exceptions raised: FE_INVALID line 51
current exceptions raised: FE_INVALID line 52
current exceptions raised: FE_INVALID line 53
current exceptions raised: FE_INVALID line 54
current exceptions raised: FE_INEXACT FE_INVALID line 55
current exceptions raised: FE_INVALID line 56
current exceptions raised: FE_INVALID line 57
current exceptions raised: FE_INVALID line 58
current exceptions raised: FE_INVALID line 59
current exceptions raised: FE_INEXACT FE_INVALID line 60
current exceptions raised: FE_INVALID line 61
current exceptions raised: FE_INVALID line 62
current exceptions raised: FE_INVALID line 63
current exceptions raised: FE_INVALID line 64

$ clang t1.c && ./a.exe
t1.c:8:14: warning: pragma STDC FENV_ACCESS ON is not supported, ignoring pragma [-Wunknown-pragmas]
#pragma STDC FENV_ACCESS ON
             ^
1 warning generated.
current exceptions raised: FE_INVALID line 45
current exceptions raised: FE_INVALID line 46
current exceptions raised: FE_INVALID line 47
current exceptions raised: FE_INVALID line 48
current exceptions raised: FE_INVALID line 49
current exceptions raised: FE_INVALID line 50
current exceptions raised: FE_INVALID line 51
current exceptions raised: FE_INVALID line 52
current exceptions raised: FE_INVALID line 53
current exceptions raised: FE_INVALID line 54
current exceptions raised: FE_INEXACT FE_INVALID line 55
current exceptions raised: FE_INEXACT FE_INVALID line 56
current exceptions raised: FE_INVALID line 57
current exceptions raised: FE_INVALID line 58
current exceptions raised: FE_INVALID line 59
current exceptions raised: FE_INEXACT FE_INVALID line 60
current exceptions raised: FE_INEXACT FE_INVALID line 61
current exceptions raised: FE_INVALID line 62
current exceptions raised: FE_INVALID line 63
current exceptions raised: FE_INVALID line 64

$ gcc t1.c && ./a.exe
current exceptions raised: FE_INVALID line 45
current exceptions raised: FE_INVALID line 46
current exceptions raised: FE_INVALID line 47
current exceptions raised: FE_INVALID line 48
current exceptions raised: FE_INVALID line 49
current exceptions raised: FE_INVALID line 50
current exceptions raised: FE_INVALID line 51
current exceptions raised: FE_INVALID line 52
current exceptions raised: FE_INVALID line 53
current exceptions raised: FE_INVALID line 54
current exceptions raised: FE_INEXACT FE_INVALID line 55
current exceptions raised: FE_INEXACT FE_INVALID line 56
current exceptions raised: FE_INVALID line 57
current exceptions raised: FE_INVALID line 58
current exceptions raised: FE_INVALID line 59
current exceptions raised: FE_INEXACT FE_INVALID line 60
current exceptions raised: FE_INEXACT FE_INVALID line 61
current exceptions raised: FE_INVALID line 62
current exceptions raised: FE_INVALID line 63
current exceptions raised: FE_INVALID line 64

问题:为什么转换 (unsigned long long)DBL_MAX(或 FLT_MAX)也会引起 FE_INEXACT 的提升?

根据 C11 6.3.1.4,代码 (unsigned long long)DBL_MAX 具有未定义的行为:

When a finite value of real floating type is converted to an integer type other than _Bool, the fractional part is discarded (i.e., the value is truncated toward zero). If the value of the integral part cannot be represented by the integer type, the behavior is undefined

由于行为未定义,“任何事情都可能发生”,即该行为不在标准范围内。

我想您是在 x86 上测试它,因为我在 x86 上看到了您描述的行为。 Example。这是低级解释。

在 x86-64 上,至少 gcc 使用 cvttsd2si 指令将大多数浮点数转换为整数,该指令将双精度浮点数转换为 32 位或 64 位 signed 整数,如果结果超出范围则引发“无效”异常。该指令可用于转换为任何有符号整数类型,也可用于转换为 32 位或更低位的无符号整数类型 - 例如,转换为无符号 32 位可以通过转换为有符号 64 位并丢弃高位来完成。

但这不适用于转换为无符号 64 位,因为输入可能是一个不适合有符号 64 位但适合无符号 64 位的数字,并且 x86 没有指令直接进行转换。因此,需要一些额外的算法,正是这些额外的指令产生了“不精确”的异常。 (具体来说,它执行 subsd 从输入中减去 (double)LLONG_MAX,当输入为 DBL_MAX 时,这确实会导致精度损失。)

请参阅 Unsigned 64-bit to double conversion: why this algorithm from g++ 以了解 gcc 为尽可能高效地执行此操作而进行的各种体操示例。

请注意,在 x86-64 上,您实际上会看到 FP_INEXACT 也转换为 unsigned long,因为它与 unsigned long long 相同。我得到了您在 x86-32 上观察到的确切行为,其中 unsigned long long 是唯一适用的 64 位类型。这种情况下的代码有点复杂,如果您真的有兴趣,我会留给您阅读程序集。

相比之下,当我在 AArch64 上 运行 这段代码时,所有行都简单地给出 FE_INVALID。这是因为 AArch64 确实有一条专用指令将浮点数转换为无符号 64 位 (fcvtzu),因此没有进一步的算术可能涉及不精确的结果。