为什么 CLFLUSH 存在于 x86 中？

Why does CLFLUSH exist in x86?

我最近了解到 row hammer 攻击。为了执行这种攻击，程序员需要为特定数量的地址刷新 CPU 的完整缓存层次结构。

我的问题是：为什么 CLFLUSH 在 x86 中是必需的？如果所有 L* 缓存都透明地运行（即不需要显式缓存失效），那么使用此指令的原因是什么？除此之外：CPU 是否可以自由推测内存访问模式，从而完全忽略该指令？

我认为主要用例是 Non-volatile DIMMs, especially Intel's Optane DC PM. It's normally ，因此需要显式刷新（或 movnt）以确保数据持久保存到非易失性存储中。

(但是 clflush 是在 Pentium 4 天时与 SSE2 同时引入的。我不知道那里的想法是什么；可能是出于性能原因的显式缓存控制，就像预取。)

Skylake 引入了弱序更高性能的 CLFLUSHOPT，因为它对于直接连接到内存层次结构的非易失性存储非常有用。刷新缓存确保数据写出到实际内存，而不是在 CPU.

中仍然脏

有关 Optane DC PM（持久内存）的某些 link 和背景，另请参阅此 SuperUser answer。它是物理地址中的非易失性存储-space，而不仅仅是虚拟地址space中的软件技巧。

Dan Luu 的 article on clwb and pcommit is interesting: the benefits of taking the OS out of the way for access to storage, detailing Intel's plans at that point for clflush / clwb and their memory-ordering semantics. It was written while Intel was still planning to require an instruction called pcommit (persistent commit) as part of this process, but Intel later decided to remove that instruction: Deprecating the PCOMMIT Instruction (from Intel) 提供了一些有趣的信息，说明了原因以及幕后工作原理。

对于设备的非高速缓存一致性 DMA 也可能很重要，如果在 x86 中可以做到这一点的话。（但是 x86 一直具有缓存一致的 DMA，因为第一个 x86 CPU 带有缓存，以避免破坏现有软件。）

显然无法将 MMIO/PCIe 设备内存区域映射为可缓存的回写 (WB)。 how to do mmap for cacheable PCIe BAR 可能 P4 架构师在引入它时考虑了未来的可能性。

在之前的 link 中，Bandwidth 博士提到了一个部分解决方法，实际上需要 CLFLUSH 来保持正确性：

map the MMIO range twice -- once for store operations from the processor to the FPGA using the Write-Combining (WC) memory type, and once for reads from the processor to the FPGA using the Write Protect (WP) or Write Through (WT) types. You will need to maintain coherence manually by using CLFLUSH on cache lines in the "read only" region when you write to the alias of that line in the "write only" region.

因此可能会造成您可能需要 clflush 的情况，而非 NV-DIMM。

为什么 CLFLUSH 存在于 x86 中？

Why does CLFLUSH exist in x86?

x86

cpu-architecture

cache-invalidation

cpu-cache

persistent-memory