Haswell、Sandy Bridge、Ivy Bridge 和 Skylake 的 BTB 尺寸?

BTB size for Haswell, Sandy Bridge, Ivy Bridge, and Skylake?

是否有任何方法可以确定或在任何资源中找到 Haswell、Sandy Bridge、Ivy Bridge 和 Skylake Intel 处理器的分支目标缓冲区大小?

检查 Agner Fog 的软件优化资源,http://www.agner.org/optimize/

BTB 应在 "The microarchitecture of Intel, AMD and VIA CPUs: An optimization guide for assembly programmers and compiler makers"、http://www.agner.org/optimize/microarchitecture.pdf

3.7 Branch prediction in Intel Sandy Bridge and Ivy Bridge

BTB organization. The branch target buffer in Sandy Bridge is bigger than in Nehalem according to unofficial rumors. It is unknown whether it has one level, as in Core 2 and earlier processors, or two levels as in Nehalem. It can handle a maximum of four call instructions per 16 bytes of code. Conditional jumps are less efficient if there are more than 3 branch instructions per 16 bytes of code.

3.8 Branch prediction in Intel Haswell, Broadwell and Skylake

BTB organization. The organization of the branch target buffer is unknown. It appears to be reasonably big.

英特尔可能会在 "Intel 64 and IA-32 Architectures Optimization Reference Manual" http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html 中描述一些关于“3.4.1 分支预测优化”的数据,但仍然没有大小。

看起来可能很奇怪,但是在 1998-2000 的 cpuid 中没有关于 BTB 的信息:http://www.installaware.com/forums/oldattachments/02142006163/tstcpuid.c (by Gerald J. Heim, University of Tübingen, Germany.). And still not listed in http://www.felixcloutier.com/x86/CPUID.html 或者在一些 public Intel 工作人员的资料中...

 * This table describes the possible cache and TLB configurations
 * as documented by Intel. For now AMD doesn't use this but gives
 * exact cache layout data on CPUID 0x8000000x.
 *
 * MAX_CACHE_FEATURES_ITERATIONS limits the possible cache information
 * to 80 bytes (of which 16 bytes are used in generic Pentii2).
 * With 80 possible caches we are on the safe side for one or two years.
 *
 * Strange enough no BHT, BTB or return stack data is given this way...

BTB 应该有一些性能监控单元 (PMU) 计数器,并且有从 运行 特殊测试程序中获取 BTB 大小的实验,检查 http://xania.org/201602/haswell-and-ivy-btb by Matt Godbolt

Conclusions

From these results, it seems Ivy Bridge (and therefore probably Sandy Bridge) uses pretty much the same strategy for BTB lookups of unconditional branches, albeit with a larger table size: 4096 entries split over 1024 sets of 4 ways.

For Haswell it seems a new approach for determining sets has been taken, along with a new approach to evicting entries.

以及他关于分支预测及其事件的更多帖子:

他的代码是 public,基于 Agner 的测试:https://github.com/mattgodbolt/agner: https://github.com/mattgodbolt/agner/blob/master/tests/btb_size.py, https://github.com/mattgodbolt/agner/blob/master/tests/branch.py