缓存一致性文献通常只提到存储缓冲区而不是读取缓冲区。然而，不知何故两者都需要？

Cache coherence literature generally only refers store buffers but not read buffers. Yet one somehow needs both?

在阅读一致性模型（即 x86 TSO）时，作者通常求助于具有大量 CPU 及其关联的存储缓冲区和私有缓存的模型。

如果我的理解是正确的，存储缓冲区可以被描述为队列，CPU 可以在其中放置他们想要提交到内存的任何存储指令。因此，正如名称所述，它们是 store 缓冲区。

但是当我阅读那些论文时，他们倾向于谈论加载和存储的交互，并带有诸如 "a later load can pass an earlier store" 之类的陈述，这有点令人困惑，因为他们几乎似乎在谈论存储缓冲区会有负载和存储，当它没有时 - 对吗？

所以一定还有一个他们没有（至少明确地）没有谈论的加载存储。另外，这两者必须以某种方式同步，所以两者都知道何时可以从内存加载和提交到内存——或者我是否遗漏了什么？

任何人都可以对此有更多的了解吗？

编辑：

让我们看一下"A primer on memory consistency and cache coherence"中的一段：

To understand the implementation of atomic RMWs in TSO, we consider the RMW as a load immediately followed by a store. The load part of the RMW cannot pass earlier loads due to TSO’s ordering rules. It might at first appear that the load part of the RMW could pass earlier stores in the write buffer, but this is not legal. If the load part of the RMW passes an earlier store, then the store part of the RMW would also have to pass the earlier store because the RMW is an atomic pair. But because stores are not allowed to pass each other in TSO, the load part of the RMW cannot pass an earlier store either

更具体地说，

The load part of the RMW cannot pass earlier loads due to TSO’s ordering rules. It might at first appear that the load part of the RMW could pass earlier stores in the write buffer

所以他们指的是在写入缓冲区中相互交叉的加载/存储（我假设这与存储缓冲区相同？）

谢谢

是的，写入缓冲区 = 存储缓冲区。

他们在谈论是否将原子 RMW 拆分为单独的加载和存储，并且存储缓冲区延迟了另一个存储（到单独的地址），因此它在加载之后但仍在存储之前。

显然这会使它成为 non-atomic，并且违反了所有 x86 原子 RMW 操作也是完全障碍的要求。（lock 前缀也暗示了这一点。）

通常 reader 很难检测到，但是如果 "separate address" 与原子 RMW 相邻，那么例如一个双字存储 + 一个双字 RMW 可以被另一个线程观察到，将两者作为一个原子操作执行 64 位 qword 加载。

回复：题目问题：

加载缓冲区不会导致重新排序。他们等待尚未到达的数据；读取数据时加载完成"executing"。

存储缓冲区根本不同；在数据变得全局可见之前，它们会保留数据一段时间。

x86 的 TSO 内存模型可以描述为 sequential-consistency + 一个 store-buffer（与 store-forwarding）。另请参阅和对该答案的评论，以进一步讨论仅允许 StoreLoad 重新排序 而不是 足以描述线程重新加载它刚刚存储的数据的情况，特别是如果加载部分与最近的存储重叠，因此 HW 将存储缓冲区中的数据与 L1d 中的数据合并以在存储全局可见之前完成加载。

另请注意，x86 CPU 推测性地 会重新排序负载（至少英特尔会这样做），但会降低 mis-speculation 以保留无 LoadLoad 的 TSO 内存模型或LoadStore 重新排序。因此，CPU 必须跟踪负载与存储排序。 Intel 将组合存储+加载缓冲区跟踪结构称为 "memory order buffer" (MOB)。 有关更多信息，请参阅。

缓存一致性文献通常只提到存储缓冲区而不是读取缓冲区。然而，不知何故两者都需要？

Cache coherence literature generally only refers store buffers but not read buffers. Yet one somehow needs both?

concurrency

x86

cpu-architecture

memory-model