为什么允许编译器优化这个繁忙的等待循环?

Why is the compiler allowed to optimize out this busy waiting loop?

#include <iostream>
#include <thread>
#include <mutex>

int main()
{
    std::atomic<bool> ready = false;

    std::thread threadB = std::thread([&]() {
        while (!ready) {}

        printf("Hello from B\n");
    });

    std::this_thread::sleep_for(std::chrono::seconds(1));

    printf("Hello from A\n");

    ready = true;

    threadB.join();

    printf("Hello again from A\n");
}

这是 CppCon 演讲中的示例 https://www.youtube.com/watch?v=F6Ipn7gCOsY&ab_channel=CppCon(第 17 分钟)

objective是先打印Hello from A然后让threadB开始。很明显应该避免忙等待,因为它使用了很多CPU.

作者说 while (!ready) {} 循环可以由编译器优化(通过将 ready 的值放入寄存器)因为编译器看到 threadB 从不休眠所以ready 永远无法更改。但即使线程从不休眠,另一个线程仍然可以更改值,对吗?没有数据竞争,因为 ready 是原子的。作者声明此代码是 UB。有人可以解释为什么允许编译器进行这样的优化吗?

作者在视频下方的一个comments中承认他错了:

I had thought so, but it appears I was wrong; the compiler cannot hoist the atomic read out of the loop. The advice at @17:54 is still correct — you should still be very careful and beware of situations where the compiler might reorder or coalesce or eliminate atomic accesses in general — but this particular while-loop is NOT actually such a situation. For some (mostly theoretical) examples of how a compiler might optimize atomic access patterns, see JF Bastien's N4455 "No Sane Compiler Would Optimize Atomics" http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4455.html