Windows 上的 Single CPU（多核）上的 C++ 多线程 "cache coherence" 是否存在问题？

Question

（编辑：澄清一下：“缓存一致性”的问题是在没有使用原子变量的情况下。）

是否有可能（单个 CPU 案例：Windows 可以运行在 Intel / AMD / Arm CPU 之上），thread-1 [= core-1 上的 30=]s 存储一个 bool 变量（例如），它保留在 L1 缓存中，而 core-n 上的线程 2 运行s 使用该变量，它在另一个上查找内存中的副本？

代码示例（为了演示这个问题，假设 std::atomic_bool 只是一个普通的 bool）：

#include <thread>
#include <atomic>
#include <chrono>

std::atomic_bool g_exit{ false }, g_exited{ false };

using namespace std::chrono_literals;

void fn()
{
    while (!g_exit.load(std::memory_order_relaxed))
    {
        // do something (lets say it takes 1-4s, repeatedly)
        std::this_thread::sleep_for(1s);
    }

    g_exited.store(true, std::memory_order_relaxed);
}

int main()
{
    std::thread wt(fn);
    wt.detach();

    // do something (lets say it took 2s)
    std::this_thread::sleep_for(2s);

    // Exit

    g_exit.store(true, std::memory_order_relaxed);

    for (int i = 0; i < 5; i++) { // Timeout: 5 seconds.
        std::this_thread::sleep_for(1s);
        if (g_exited.load(std::memory_order_relaxed)) {
            break;
        }
    }
}

Answer 1

CPU 缓存始终在我们运行 C++ 线程跨越 ¹ 的内核之间保持一致，无论它们是否在同一个包中（a multi-core CPU) and/or 分布在具有互连的套接字上。一旦写入线程的存储已执行并提交到缓存，就不可能加载过时值。作为这样做的一部分，它将向系统中的所有其他缓存发送一个无效请求。

其他线程总能最终看到您对 std::atomic 变量的更新，即使 mo_relaxed 也是如此。这就是重点； std::atomic 如果它对这个不起作用就没用了。

但是如果没有 std::atomic，你的代码会非常糟糕，https://electronics.stackexchange.com/questions/387181/mcu-programming-c-o2-optimization-breaks-while-loop/387478#387478 / 的一个经典示例 - 编译器可以假设没有其他线程正在写入它正在读取的 non-atomic var ，因此它可以将实际负载提升到循环之外并将其保存在 thread-private CPU register 中。所以它根本不是来自一致缓存的 re-reading。即 while(!exit_now){} 变为 if(!exit_now) while(1){} 普通 bool exit_now 全局。

（除了你的 sleep_for 调用可能会阻止优化。它可能没有声明为纯的，因为你不希望编译器优化对它的多次调用；时间是 side-effect . 所以编译器必须假设对它的调用可以修改全局变量，因此 re-read 内存中的全局变量（使用通过连贯缓存的正常加载指令）。

脚注 1：支持 std::thread 的 C++ 实现仅运行它跨同一一致性域中的内核。在几乎所有系统中，只有一个一致性域包含所有插槽中的所有内核，但节点之间具有 non-coherent 共享内存的巨大集群是可能的。

具有 ARM 微控制器内核共享内存但不与 ARM DSP 内核一致的嵌入式板也是如此。您不会运行在这两个内核上使用单个 OS，您也不会考虑在同一 C++ 程序的不同内核部分上使用代码运行。

有关缓存一致性的更多详细信息，请参阅When to use volatile with multi threading?

Windows 上的 Single CPU（多核）上的 C++ 多线程 "cache coherence" 是否存在问题？

Is there an issue with "cache coherence" on C++ multi-threading on a Single CPU (Multi-Core) on Windows?

c++

multithreading

cpu-architecture

memory-barriers

stdatomic

Windows 上的 *Single CPU*（多核）上的 C++ 多线程 "cache coherence" 是否存在问题？

Is there an issue with "cache coherence" on C++ multi-threading on a *Single CPU* (Multi-Core) on Windows?

c++

multithreading

cpu-architecture

memory-barriers

stdatomic

Windows 上的 Single CPU（多核）上的 C++ 多线程 "cache coherence" 是否存在问题？

Is there an issue with "cache coherence" on C++ multi-threading on a Single CPU (Multi-Core) on Windows?