如果在链接时启用“-ffast-math”会发生什么?
What happens if "-ffast-math" is enabled when linking?
我在 Ubuntu 中同时使用 gcc10 和 clang12。我刚刚发现,如果我启用 -ffast-math
标志,在我的 C++ 项目中,将有大约 4 倍的性能提升。
但是,如果我只在编译时启用 -ffast-math
而不是在 link 时,则不会有性能改进。在 linking 时使用 -ffast-math
是什么意思,它会 link 到系统中任何特殊的 ffast-math
库吗?
P.S:这个性能提升其实是让性能正常了。我曾经问过 question 关于 AVX 指令在 Intel 处理器上性能不佳的问题。现在我只要使用 -ffast-math
flag 编译和 Linux 上的 link 程序就可以使性能正常,但即使我使用 clang 和 [=26] 上的 -ffast-math
=],性能还是差。所以我想知道我是否 linked 到 Linux.
下的任何特殊系统库
However, if I only enable -ffast-math
at compile time and not at link time, there will be no performance improvement. What does it mean to use -ffast-math
when linking, and will it link to any special ffast-math libraries in the system?
结果是 gcc
在 crtfastmath.o
中执行 link 当 -ffast-math
被指定为 linker(未记录的功能)时。
对于 x86
参见 https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/i386/crtfastmath.c#L83,它设置以下 CPU 选项:
#define MXCSR_DAZ (1 << 6) /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */
非规范化浮点数处理起来要慢得多,因此在 CPU 中禁用它们会使浮点计算更快。
来自 Intel 64 和 IA-32 架构优化参考手册:
6.5.3 Flush-to-Zero and Denormals-are-Zero Modes
The flush-to-zero (FTZ) and denormals-are-zero (DAZ) modes are not compatible with the IEEE Standard 754. They are provided to improve performance for applications where underflow is common and where the generation of a denormalized result is not necessary.
3.8.3.3 Floating-point Exceptions in SSE/SSE2/SSE3 Code
Most special situations that involve masked floating-point exceptions are handled efficiently in hardware. When a masked overflow exception occurs while executing SSE/SSE2/SSE3 code, processor hardware can handles it without performance penalty.
Underflow exceptions and denormalized source operands are usually treated according to the IEEE 754 specification, but this can incur significant performance delay. If a programmer is willing to trade pure IEEE 754 compliance for speed, two non-IEEE 754 compliant modes are provided to speed situations where underflows and input are frequent: FTZ
mode and DAZ
mode.
When the FTZ
mode is enabled, an underflow result is automatically converted to a zero with the correct sign. Although this behavior is not compliant with IEEE 754, it is provided for use in applications where performance is more important than IEEE 754 compliance. Since denormal results are not produced when the FTZ
mode is enabled, the only denormal floating-point numbers that can be encountered in FTZ
mode are the ones specified as constants (read only).
The DAZ
mode is provided to handle denormal source operands efficiently when running a SIMD floating-point application. When the DAZ
mode is enabled, input denormals are treated as zeros with the same sign. Enabling the DAZ
mode is the way to deal with denormal floating-point constants when perfor mance is the objective.
If departing from the IEEE 754 specification is acceptable and performance is critical, run SSE/SSE2/SSE3 applications with FTZ
and DAZ
modes enabled.
我在 Ubuntu 中同时使用 gcc10 和 clang12。我刚刚发现,如果我启用 -ffast-math
标志,在我的 C++ 项目中,将有大约 4 倍的性能提升。
但是,如果我只在编译时启用 -ffast-math
而不是在 link 时,则不会有性能改进。在 linking 时使用 -ffast-math
是什么意思,它会 link 到系统中任何特殊的 ffast-math
库吗?
P.S:这个性能提升其实是让性能正常了。我曾经问过 question 关于 AVX 指令在 Intel 处理器上性能不佳的问题。现在我只要使用 -ffast-math
flag 编译和 Linux 上的 link 程序就可以使性能正常,但即使我使用 clang 和 [=26] 上的 -ffast-math
=],性能还是差。所以我想知道我是否 linked 到 Linux.
However, if I only enable
-ffast-math
at compile time and not at link time, there will be no performance improvement. What does it mean to use-ffast-math
when linking, and will it link to any special ffast-math libraries in the system?
结果是 gcc
在 crtfastmath.o
中执行 link 当 -ffast-math
被指定为 linker(未记录的功能)时。
对于 x86
参见 https://github.com/gcc-mirror/gcc/blob/master/libgcc/config/i386/crtfastmath.c#L83,它设置以下 CPU 选项:
#define MXCSR_DAZ (1 << 6) /* Enable denormals are zero mode */
#define MXCSR_FTZ (1 << 15) /* Enable flush to zero mode */
非规范化浮点数处理起来要慢得多,因此在 CPU 中禁用它们会使浮点计算更快。
来自 Intel 64 和 IA-32 架构优化参考手册:
6.5.3 Flush-to-Zero and Denormals-are-Zero Modes
The flush-to-zero (FTZ) and denormals-are-zero (DAZ) modes are not compatible with the IEEE Standard 754. They are provided to improve performance for applications where underflow is common and where the generation of a denormalized result is not necessary.
3.8.3.3 Floating-point Exceptions in SSE/SSE2/SSE3 Code
Most special situations that involve masked floating-point exceptions are handled efficiently in hardware. When a masked overflow exception occurs while executing SSE/SSE2/SSE3 code, processor hardware can handles it without performance penalty.
Underflow exceptions and denormalized source operands are usually treated according to the IEEE 754 specification, but this can incur significant performance delay. If a programmer is willing to trade pure IEEE 754 compliance for speed, two non-IEEE 754 compliant modes are provided to speed situations where underflows and input are frequent:
FTZ
mode andDAZ
mode.When the
FTZ
mode is enabled, an underflow result is automatically converted to a zero with the correct sign. Although this behavior is not compliant with IEEE 754, it is provided for use in applications where performance is more important than IEEE 754 compliance. Since denormal results are not produced when theFTZ
mode is enabled, the only denormal floating-point numbers that can be encountered inFTZ
mode are the ones specified as constants (read only).The
DAZ
mode is provided to handle denormal source operands efficiently when running a SIMD floating-point application. When theDAZ
mode is enabled, input denormals are treated as zeros with the same sign. Enabling theDAZ
mode is the way to deal with denormal floating-point constants when perfor mance is the objective.If departing from the IEEE 754 specification is acceptable and performance is critical, run SSE/SSE2/SSE3 applications with
FTZ
andDAZ
modes enabled.