PPC64 long double 的机器 epsilon 计算

Question

我正在玩 qemu 中模拟的 PPC64 虚拟机，试图模仿 POWER8 CPU。

这里，long double 类型不同于 x86 中用于 long double 的 80 位浮点数，据我所知，它也不符合 IEEE754 的 float128，因为它有一个尾数根据 C 宏 LDBL_MANT_DIG 的 106 位（对比 IEEE754 为其 float128 规定的 112 位尾数）。

维基百科说 IEEE754 float128 的机器 epsilon 应该在 1.93e-34 左右，这比 80 位 x86 float (1.08e-19) 好得多。

然而，当我尝试在此虚拟机中获取机器 epsilon 时，我得到了一个相当令人惊讶的答案：

#include <iostream>
int main()
{
    long double eps = 1.0l;
    while (1.0l + 0.5l * eps != 1.0l)
        eps = 0.5l * eps;
    std::cout << eps << std::endl;
    return 0;
}

输出如下：

4.94066e-324

我从 LDBL_EPSILON 和 std::numeric_limits<long double>::epsilon() 得到相同的结果。

这将使它比预期精确大约 10 倍，逻辑告诉我这应该是不可能的。看到尾数恰好是 2x53（IEEE754 的 float64 的尾数），我假设它可能使用双精度结构，维基百科还说它在小数字方面的精度应该低于 IEEE754 float128。

这里发生了什么？

Answer 1

首先，假设您的操作系统是 Linux。到目前为止，所有 64 位 PowerPC 上的编译器默认使用 long double 的“double-double”类型，其格式不符合 IEEE。

格式其实就是两个double的组合，所以理解为struct { double high; double low; }。高位部分与正常的 double 没有区别，而低位部分提供扩展尾数。整体的指数和double一样，也就是说它最大的可表示数并不比double的大几个数量级（因为double-double的尾数比较长，还是有区别的） ).

目前双双浮点数的运算没有原生的 PowerPC 指令支持。 long double的a+b最终会被翻译成对函数__gcc_qadd的调用。（都是 LLVM 的 compiler-rt and GCC’s libgcc have their implementations, see source code of the add function）

因为 POWER ISA 3.0 (Power9), native instruction support for ‘binary128’ (IEEE-compliant 128-bit float type) is supported. You can use -mcpu=power9 -mfloat128 to enable the feature and use __float128 to represent it, or add -mabi=ieeelongdouble 使编译器将 long double 视为 binary128 而不是 double-double（连同 C 库声明）。

Binary128不是两个double的组合，而是stored/passed带向量寄存器，精度比double-double好很多，有112位尾数和15位指数。 GCC/Clang 实际上通过使用 compiler-rt/libgcc 中的支持函数，为 VSX（Power7 或更高版本）的目标支持 binary128。（例如，a+b 为 Power9 及更高版本生成 xsaddqp 指令，为 Power7 和 Power8 生成 __addkf3 ）

如果你的工具链比较新（例如，advanced toolchain 14 or newer), try enabling -mabi=ieeelongdouble with C code. After C++ library support finished, GCC and Clang plans to switch以后在64位little endian上默认long double类型为binary128。

PPC64 long double 的机器 epsilon 计算

PPC64 long double's machine epsilon calculation

floating-point

powerpc

epsilon

ieee-754

double-double-arithmetic