GCC 如何在 x86_64 上编译 80 位宽的 10 字节浮点数 __float80？

Question

根据 What's A Creel video, "Modern x64 Assembly 4: Data Types" (link to the slide) 的视频中的一张幻灯片，

Note: real10 is only used with the x87 FPU, it is largely ignored nowadays but offers amazing precision!

他说，

"Real10 is only used with the x87 Floating Point Unit. [...] It's interesting the massive gain in precision that it offers you. You kind of take a performance hit with that gain because you can't use real10 with SSE, packed, SIMD style instructions. But, it's kind of interesting because if you want extra precision you can go to the x87 style FPU. Now a days it's almost never used at all."

但是，我在谷歌搜索时看到 GCC supports __float80 and __float128。

GCC中的__float80是在x87上计算的吗？或者它像其他浮点操作一样使用 SIMD？ __float128 呢？

Answer 1

我找到了 answer here

__float80 is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode) floating type. It is an alias for the type name _Float64x on these targets.

已查找 XFmode，

“Extended Floating” mode represents an IEEE extended floating point number. This mode only has 80 meaningful bits (ten bytes). Some processors require such numbers to be padded to twelve bytes, others to sixteen; this mode is used for either.

还是不太服气，简单整理了一下

int main () {
    __float80 a = 1.445839898;
    return 1;
}

我使用 Radare 将其丢弃，

0x00000652      db2dc8000000   fld xword [0x00000720]
0x00000658      db7df0         fstp xword [local_10h]

我相信 fld 和 fstp 是 x87 指令集的一部分。所以它确实被用于 __float80 10 字节浮点数，但是在 __float128 上，我得到

0x000005fe      660f6f05aa00.  movdqa xmm0, xmmword [0x000006b0]
0x00000606      0f2945f0       movaps xmmword [local_10h], xmm0

所以我们可以在这里看到我们正在使用 SIMD xmmword

Answer 2

GCC docs for Additional Floating Types:

ISO/IEC TS 18661-3:2015 defines C support for additional floating types _Floatn and _Floatnx

... GCC does not currently support _Float128x on any systems.

我认为 _Float128x 是 IEEE binary128，即具有巨大指数范围的真正 128 位浮点数。参见 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1691.pdf。

__float80 显然是 x87 10 字节类型。在 x86-64 SysV ABI 中，它与 long double 相同；在该 ABI 中都具有 16 字节对齐。

__float80 is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode) floating type. It is an alias for the type name _Float64x on these targets.

我认为 __float128 是一种使用 SSE2 的扩展精度类型，大概是一种“double double”格式，其尾数宽度是其两倍，但指数限制与 64 位 double 相同。（即小于 __float80 的指数范围）

On i386, x86_64, and ..., __float128 is an alias for _Float128

Optimize for fast multiplication but slow addition: FMA and doubledouble
double-double implementation resilient to FPU rounding mode

这些可能与 gcc 使用 __float128 为您提供的 doubledouble 相同。或者可能是纯软件浮点128位

Godbolt compiler explorer for gcc7.3 -O3（与 gcc4.6 相同，显然这些类型不是新的）

//long double add_ld(long double x) { return x+x; }  // same as __float80
__float80 add80(__float80 x) { return x+x; }

    fld     TBYTE PTR [rsp+8]    # arg on the stack
    fadd    st, st(0)
    ret                          # and returned in st(0)


__float128 add128(__float128 x) { return x+x; }

          # IDK why not movapd or better movaps, silly compiler
    movdqa  xmm1, xmm0           # x arg in xmm0
    sub     rsp, 8               # align the stack
    call    __addtf3             # args in xmm0, xmm1
    add     rsp, 8
    ret                          # return value in xmm0, I assume


int size80 = sizeof(__float80);    // 16
int sizeld = sizeof(long double);  // 16

int size128 = sizeof(__float128);  // 16

所以 gcc 为 __float128 加法调用 libgcc 函数，而不是将增量内联到指数或任何类似的聪明东西。

GCC 如何在 x86_64 上编译 80 位宽的 10 字节浮点数 __float80？

How does GCC compile the 80 bit wide 10 byte float __float80 on x86_64?

floating-point

precision

x86

gcc

x87