GCC 如何在 x86_64 上编译 80 位宽的 10 字节浮点数 __float80?
How does GCC compile the 80 bit wide 10 byte float __float80 on x86_64?
根据 What's A Creel video, "Modern x64 Assembly 4: Data Types" (link to the slide) 的视频中的一张幻灯片,
Note: real10 is only used with the x87 FPU, it is largely ignored nowadays but offers amazing precision!
他说,
"Real10 is only used with the x87 Floating Point Unit. [...] It's interesting the massive gain in precision that it offers you. You kind of take a performance hit with that gain because you can't use real10 with SSE, packed, SIMD style instructions. But, it's kind of interesting because if you want extra precision you can go to the x87 style FPU. Now a days it's almost never used at all."
但是,我在谷歌搜索时看到 GCC supports __float80
and __float128
。
GCC中的__float80
是在x87上计算的吗?或者它像其他浮点操作一样使用 SIMD? __float128
呢?
我找到了 answer here
__float80
is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode
) floating type. It is an alias for the type name _Float64x on these targets.
已查找 XFmode
,
“Extended Floating” mode represents an IEEE extended floating point number. This mode only has 80 meaningful bits (ten bytes). Some processors require such numbers to be padded to twelve bytes, others to sixteen; this mode is used for either.
还是不太服气,简单整理了一下
int main () {
__float80 a = 1.445839898;
return 1;
}
我使用 Radare 将其丢弃,
0x00000652 db2dc8000000 fld xword [0x00000720]
0x00000658 db7df0 fstp xword [local_10h]
我相信 fld
和 fstp
是 x87
指令集的一部分。所以它确实被用于 __float80
10 字节浮点数,但是在 __float128
上,我得到
0x000005fe 660f6f05aa00. movdqa xmm0, xmmword [0x000006b0]
0x00000606 0f2945f0 movaps xmmword [local_10h], xmm0
所以我们可以在这里看到我们正在使用 SIMD xmmword
GCC docs for Additional Floating Types:
ISO/IEC TS 18661-3:2015 defines C support for additional floating types _Floatn
and _Floatnx
... GCC does not currently support _Float128x on any systems.
我认为 _Float128x
是 IEEE binary128,即具有巨大指数范围的真正 128 位浮点数。参见 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1691.pdf。
__float80
显然是 x87 10 字节类型。在 x86-64 SysV ABI 中,它与 long double
相同;在该 ABI 中都具有 16 字节对齐。
__float80 is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode) floating type. It is an alias for the type name _Float64x on these targets.
我认为 __float128
是一种使用 SSE2 的扩展精度类型,大概是一种“double double”格式,其尾数宽度是其两倍,但指数限制与 64 位 double
相同。 (即小于 __float80
的指数范围)
On i386, x86_64, and ..., __float128 is an alias for _Float128
- Optimize for fast multiplication but slow addition: FMA and doubledouble
- double-double implementation resilient to FPU rounding mode
这些可能与 gcc 使用 __float128
为您提供的 doubledouble 相同。或者可能是纯软件浮点128位
Godbolt compiler explorer for gcc7.3 -O3(与 gcc4.6 相同,显然这些类型不是新的)
//long double add_ld(long double x) { return x+x; } // same as __float80
__float80 add80(__float80 x) { return x+x; }
fld TBYTE PTR [rsp+8] # arg on the stack
fadd st, st(0)
ret # and returned in st(0)
__float128 add128(__float128 x) { return x+x; }
# IDK why not movapd or better movaps, silly compiler
movdqa xmm1, xmm0 # x arg in xmm0
sub rsp, 8 # align the stack
call __addtf3 # args in xmm0, xmm1
add rsp, 8
ret # return value in xmm0, I assume
int size80 = sizeof(__float80); // 16
int sizeld = sizeof(long double); // 16
int size128 = sizeof(__float128); // 16
所以 gcc 为 __float128
加法调用 libgcc 函数,而不是将增量内联到指数或任何类似的聪明东西。
根据 What's A Creel video, "Modern x64 Assembly 4: Data Types" (link to the slide) 的视频中的一张幻灯片,
Note: real10 is only used with the x87 FPU, it is largely ignored nowadays but offers amazing precision!
他说,
"Real10 is only used with the x87 Floating Point Unit. [...] It's interesting the massive gain in precision that it offers you. You kind of take a performance hit with that gain because you can't use real10 with SSE, packed, SIMD style instructions. But, it's kind of interesting because if you want extra precision you can go to the x87 style FPU. Now a days it's almost never used at all."
但是,我在谷歌搜索时看到 GCC supports __float80
and __float128
。
GCC中的__float80
是在x87上计算的吗?或者它像其他浮点操作一样使用 SIMD? __float128
呢?
我找到了 answer here
__float80
is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode
) floating type. It is an alias for the type name _Float64x on these targets.
已查找 XFmode
,
“Extended Floating” mode represents an IEEE extended floating point number. This mode only has 80 meaningful bits (ten bytes). Some processors require such numbers to be padded to twelve bytes, others to sixteen; this mode is used for either.
还是不太服气,简单整理了一下
int main () {
__float80 a = 1.445839898;
return 1;
}
我使用 Radare 将其丢弃,
0x00000652 db2dc8000000 fld xword [0x00000720]
0x00000658 db7df0 fstp xword [local_10h]
我相信 fld
和 fstp
是 x87
指令集的一部分。所以它确实被用于 __float80
10 字节浮点数,但是在 __float128
上,我得到
0x000005fe 660f6f05aa00. movdqa xmm0, xmmword [0x000006b0]
0x00000606 0f2945f0 movaps xmmword [local_10h], xmm0
所以我们可以在这里看到我们正在使用 SIMD xmmword
GCC docs for Additional Floating Types:
ISO/IEC TS 18661-3:2015 defines C support for additional floating types
_Floatn
and_Floatnx
... GCC does not currently support _Float128x on any systems.
我认为 _Float128x
是 IEEE binary128,即具有巨大指数范围的真正 128 位浮点数。参见 http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1691.pdf。
__float80
显然是 x87 10 字节类型。在 x86-64 SysV ABI 中,它与 long double
相同;在该 ABI 中都具有 16 字节对齐。
__float80 is available on the i386, x86_64, and IA-64 targets, and supports the 80-bit (XFmode) floating type. It is an alias for the type name _Float64x on these targets.
我认为 __float128
是一种使用 SSE2 的扩展精度类型,大概是一种“double double”格式,其尾数宽度是其两倍,但指数限制与 64 位 double
相同。 (即小于 __float80
的指数范围)
On i386, x86_64, and ..., __float128 is an alias for _Float128
- Optimize for fast multiplication but slow addition: FMA and doubledouble
- double-double implementation resilient to FPU rounding mode
这些可能与 gcc 使用 __float128
为您提供的 doubledouble 相同。或者可能是纯软件浮点128位
Godbolt compiler explorer for gcc7.3 -O3(与 gcc4.6 相同,显然这些类型不是新的)
//long double add_ld(long double x) { return x+x; } // same as __float80
__float80 add80(__float80 x) { return x+x; }
fld TBYTE PTR [rsp+8] # arg on the stack
fadd st, st(0)
ret # and returned in st(0)
__float128 add128(__float128 x) { return x+x; }
# IDK why not movapd or better movaps, silly compiler
movdqa xmm1, xmm0 # x arg in xmm0
sub rsp, 8 # align the stack
call __addtf3 # args in xmm0, xmm1
add rsp, 8
ret # return value in xmm0, I assume
int size80 = sizeof(__float80); // 16
int sizeld = sizeof(long double); // 16
int size128 = sizeof(__float128); // 16
所以 gcc 为 __float128
加法调用 libgcc 函数,而不是将增量内联到指数或任何类似的聪明东西。