如何正确确定英特尔处理器的-march 和-mtune？

Question

我目前正在从源代码构建一个对我来说性能至关重要的软件。因此，我想在我的特定英特尔 CPU 上针对运行对其进行优化。构建过程需要我设置 -march 和 -mtune 标志。

如果在我的处理器节点上我使用

gcc -march=native -Q --help=target|grep march
gcc -mtune=native -Q --help=target|grep mtune

我得到 "core-avx2" 的 march 和 "generic" 的 mtune。然而

cat /proc/cpuinfo

我得到：

processor   : 23
vendor_id   : GenuineIntel
cpu family  : 6
model       : 63
model name  : Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz
stepping    : 2
microcode   : 0x3d
cpu MHz     : 2599.993
cache size  : 30720 KB
physical id : 1
siblings    : 12
core id     : 13
cpu cores   : 12
apicid      : 58
initial apicid  : 58
fpu     : yes
fpu_exception   : yes
cpuid level : 15
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt xsave avx f16c rdrand lahf_lm abm epb intel_ppin ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid cqm xsaveopt cqm_llc cqm_occup_llc dtherm ida arat pln pts
bogomips    : 4599.35
clflush size    : 64
cache_alignment : 64
address sizes   : 46 bits physical, 48 bits virtual
power management:

通过访问 Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz (https://ark.intel.com/content/www/de/de/ark/products/81709/intel-xeon-processor-e5-2670-v3-30m-cache-2-30-ghz.html) 的主页，我发现：代号 -> 以前的产品 haswell

如果我用

gcc -march=haswell -Q --help=target|grep march
gcc -mtune=haswell -Q --help=target|grep mtune

我都得到 "haswell"。那么我真的不应该使用 haswell 作为 march 而不是 core-avx2 吗？最好的选择是什么？

顺便说一句，我在 CentOS7 上使用 GCC 4.8.5。

谢谢！

编辑：

gcc -march=native -Q --help=target | grep -- '-march=' | cut -f3

-> core-avx2

gcc -mtune=native -Q --help=target | grep -- '-mtune=' | cut -f3

-> 通用

Answer 1

Btw, I am using GCC 4.8.5 on CentOS7.

如果性能很重要，您应该使用更新版本的 GCC。 The 4.8 release series dates back to 2013, and lacks many performance enhancements that are present in current versions. Current versions have significantly expanded tuning options for x86，包括 -march 2013 年不存在的许多处理器系列的设置。

Answer 2

在您使用的 gcc 版本中，Haswell 被称为 core-avx2。其他微架构也有糟糕的名字。例如，Ivy Bridge、Sandy Bridge 和 Westmere 分别被称为 core-avx-i、corei7-avx 和 corei7。从 gcc 4.9.0 开始，使用微体系结构的实际名称，因此当在 Haswell 处理器上使用 gcc -march=native -Q --help=target|grep march 而不是 core-avx2 时，gcc 将打印 Haswell（请参阅 patch）。

当将 -mtune=native 传递给 gcc 并且主机处理器不为您正在使用的 gcc 版本所知时，它将应用 generic 调整。您的处理器型号 (63) 只有 gcc 5.1.0 及更高版本才知道（请参阅 patch）。

-Q --help=target 的 name-printing 部分必须为 -march=native 选择一些名称。对于 CPUs 太新以至于你的 GCC 无法具体识别，如果处理器支持 ADX，它将选择类似 Broadwell 的东西，或者支持主机处理器支持的最高 SIMD 扩展（最高 AVX2）的微架构（由 cpuid).

决定

但是 -march=native 的实际效果是启用所有适当的 -mavx -mpopcnt -mbmi2 -mcx16 等等选项，所有这些都是使用 cpuid 单独检测到的.因此，出于 code-gen 的目的，-march=native 始终可以启用您的 GCC 知道如何使用的 ISA 扩展，即使它无法识别您的 CPU.

但是对于设置 tune 选项，-march=native 或 -mtune=native 完全失败并在无法准确识别您的 CPU 时退回到 generic .不幸的是，它不会为 unknown-Intel CPUs.

做 tune=intel 这样的事情

在你的处理器上，gcc 知道它支持 AVX2，所以它假设它是一个 Haswell 处理器（在你的 gcc 版本中称为 core-avx2），因为 AVX2 开始支持在 Haswell 上，但它不确定它实际上是 Haswell 处理器。这就是为什么它应用通用调整而不是针对 core-avx2（即 Haswell）进行调整的原因。但在这种情况下，我认为这与调整 core-avx2 具有相同的效果，因为对于该编译器版本，只有 Haswell 支持 AVX2，并且编译器知道主机处理器支持 AVX2。不过，一般来说，即使 -march 在未知的 CPU.

上被正确猜测，它也可能不会针对本机微体系结构进行调整。

(编者按：不，tune=generic不适应启用了哪些instruction-set选项。它仍然是完全通用的调整，包括关心CPUs像AMD Phenom或不支持 AVX2 的 Intel Sandybridge。请参阅 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80568 and 。

这是 为什么 您应该使用 -march=native 或 -march=haswell（使用足够新的 gcc）而不仅仅是 -mavx2 -mfma 的原因之一。另一个原因是你可能会忘记 -mbmi2 -mpopcnt -mcx16，甚至可能会忘记 -mfma)

如何正确确定英特尔处理器的-march 和-mtune？

How to correctly determine -march and -mtune for Intel processors?

performance

x86

gcc

intel

compiler-optimization