独立的读-修改-写顺序

Question

我运行通过 Relacy 一堆算法来验证它们的正确性，我偶然发现了一些我并不真正理解的东西。这是它的简化版本：

#include <thread>
#include <atomic>
#include <iostream>
#include <cassert> 

struct RMW_Ordering
{
    std::atomic<bool> flag {false};
    std::atomic<unsigned> done {0}, counter {0};
    unsigned race_cancel {0}, race_success {0}, sum {0};

    void thread1() // fail
    {
        race_cancel = 1; // data produced

        if (counter.fetch_add(1, std::memory_order_release) == 1 &&
            !flag.exchange(true, std::memory_order_relaxed))
        {
            counter.store(0, std::memory_order_relaxed);
            done.store(1, std::memory_order_relaxed);
        }
    }

    void thread2() // success
    {
        race_success = 1; // data produced

        if (counter.fetch_add(1, std::memory_order_release) == 1 &&
            !flag.exchange(true, std::memory_order_relaxed))
        {
            done.store(2, std::memory_order_relaxed);
        }
    }

    void thread3()
    {
        while (!done.load(std::memory_order_relaxed)); // livelock test
        counter.exchange(0, std::memory_order_acquire);
        sum = race_cancel + race_success;
    }
};

int main()
{
    for (unsigned i = 0; i < 1000; ++i)
    {
        RMW_Ordering test;

        std::thread t1([&]() { test.thread1(); });    
        std::thread t2([&]() { test.thread2(); });
        std::thread t3([&]() { test.thread3(); });

        t1.join();
        t2.join();
        t3.join();

        assert(test.counter == 0);
    }

    std::cout << "Done!" << std::endl;
}

两个线程竞相进入受保护区域，最后一个修改完成，从无限循环中释放第三个线程。该示例有点做作，但原始代码需要通过标志声明此区域以发出信号 "done".

最初，fetch_add 有 acq_rel 订购，因为我担心交易所可能会在之前重新订购它可能会导致一个线程要求该标志，首先尝试 fetch_add 检查，并阻止另一个线程（通过增量检查）成功修改调度。在使用 Relacy 进行测试时，我想如果我从 acq_rel 切换到 release[=44，我会看到我预期会发生的活锁是否会发生=]，令我惊讶的是，它没有。然后我对所有内容都使用了 relaxed，同样，没有活锁。

我试图在 C++ 标准中找到与此相关的任何规则，但只设法挖掘出这些：

1.10.7 In addition, there are relaxed atomic operations, which are not synchronization operations, and atomic read-modify-write operations, which have special characteristics.

29.3.11 Atomic read-modify-write operations shall always read the last value (in the modification order) written before the write associated with the read-modify-write operation.

我能否始终依赖 RMW 操作不被重新排序 - 即使它们影响不同的内存位置 - 标准中是否有任何保证这种行为的内容？

编辑:

我想出了一个更简单的设置，应该可以更好地说明我的问题。这是它的 CppMem 脚本：

int main() { atomic_int x = 0; atomic_int y = 0; {{{ { if (cas_strong_explicit(&x, 0, 1, relaxed, relaxed)) { cas_strong_explicit(&y, 0, 1, relaxed, relaxed); } } ||| { if (cas_strong_explicit(&x, 0, 2, relaxed, relaxed)) { cas_strong_explicit(&y, 0, 2, relaxed, relaxed); } } ||| { // Is it possible for x and y to read 2 and 1, or 1 and 2? x.load(relaxed).readsvalue(2); y.load(relaxed).readsvalue(1); } }}} return 0; }

我认为该工具不够复杂，无法评估这种情况，尽管它似乎确实表明这是可能的。这是几乎等效的 Relacy 设置：

#include "relacy/relacy_std.hpp" struct rmw_experiment : rl::test_suite<rmw_experiment, 3> { rl::atomic<unsigned> x, y; void before() { x($) = y($) = 0; } void thread(unsigned tid) { if (tid == 0) { unsigned exp1 = 0; if (x($).compare_exchange_strong(exp1, 1, rl::mo_relaxed)) { unsigned exp2 = 0; y($).compare_exchange_strong(exp2, 1, rl::mo_relaxed); } } else if (tid == 1) { unsigned exp1 = 0; if (x($).compare_exchange_strong(exp1, 2, rl::mo_relaxed)) { unsigned exp2 = 0; y($).compare_exchange_strong(exp2, 2, rl::mo_relaxed); } } else { while (!(x($).load(rl::mo_relaxed) && y($).load(rl::mo_relaxed))); RL_ASSERT(x($) == y($)); } } }; int main() { rl::simulate<rmw_experiment>(); }

从未违反断言，因此根据 Relacy，1 和 2（或相反）是不可能的。

Answer 1

我还没有完全理解你的代码，但粗体问题有一个简单的答案：

Can I always rely on RMW operations not being reordered - even if they affect different memory locations

不，你不能。非常允许在同一线程中对两个宽松的 RMW 进行编译时重新排序。（我认为运行两个 RMW 的时间重新排序在大多数 CPU 上实际上是不可能的。ISO C++ 不区分编译时间和运行-时间。）

但请注意，原子 RMW 包括加载和存储，并且这两部分必须保持在一起。所以任何类型的 RMW 都不能更早地通过获取操作，或者更晚地通过释放操作。

此外，当然 RMW 本身是一个释放 and/or 获取操作可以停止在一个或另一个方向上重新排序。

当然，C++ 内存模型并未根据对高速缓存一致的共享内存访问的本地重新排序进行正式定义，仅根据与另一个线程同步和创建发生在之前/之后的关系进行定义。但是，如果您忽略 IRIW 重新排序（2 reader 线程不同意两个编写器线程对不同变量进行独立存储的顺序），那么对同一事物建模几乎有两种不同的方法。

Answer 2

在您的第一个示例中，保证 flag.exchange 始终在 counter.fetch_add 之后执行，因为 && 短路 -即，如果第一个表达式解析为 false，则永远不会执行第二个表达式。 C++ 标准保证了这一点，因此编译器无法重新排序这两个表达式（无论它们使用哪种内存顺序）。

正如 Peter Cordes 已经解释的那样，C++ 标准没有说明是否或何时可以根据原子操作对指令进行重新排序。通常，大多数编译器优化依赖于 as-if:

The semantic descriptions in this International Standard deﬁne a parameterized nondeterministic abstract machine. This International Standard places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine [..].

This provision is sometimes called the “as-if” rule, because an implementation is free to disregard any requirement of this International Standard as long as the result is as if the requirement had been obeyed, as far as can be determined from the observable behavior of the program. For instance, an actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no side eﬀects aﬀecting the observable behavior of the program are produced.

这里的关键方面是 "observable behavior"。假设您在两个不同的原子对象上有两个松弛的原子负载 A 和 B，其中 A 是有序的在 B.

之前

std::atomic<int> x, y;

x.load(std::memory_order_relaxed); // A
y.load(std::memory_order_relaxed); // B

先序关系是先行关系定义的一部分，因此可能会假设这两个操作不能重新排序。但是，由于这两个操作放宽了，所以"observable behavior"并不能保证，即即使按照原来的顺序，x.load（A）也可以return 比 y.load (B) 更新的结果，因此编译器可以自由地重新排序它们，因为 最终程序不会能够区分（即，可观察到的行为是等价的）。如果它不等价，那么你就会有竞争条件！ ;-)

要防止此类重新排序，您必须依赖（线程间）先行关系。如果 x.load (A) 将使用 memory_order_acquire，那么编译器将不得不假设此操作与某些释放操作同步，从而建立一个 (inter -thread) happens-before 关系。假设其他某个线程执行两个原子更新：

y.store(42, std::memory_order_relaxed); // C
x.store(1, std::memory_order_release); // D

如果acquire-load A看到store-release D存储的值，那么这两个操作是同步的, 从而建立一个 happens-before 关系。由于 y.store 排在 x.store 之前，而 x.load 排在前面，happens-before 关系的传递性保证 y.store happens-before y.load。对两个加载或两个存储重新排序会破坏这种保证，因此也会改变可观察到的行为。因此，编译器无法执行此类重新排序。

一般来说，争论可能的重新排序是错误的做法。在第一步中，您应该始终确定所需的先行关系（例如，y.store 必须先于 y.load 发生）。下一步是确保在所有情况下都正确建立这些先行发生关系。至少我是这样处理无锁算法实现的正确性参数的。

关于Relacy：Relacy 仅模拟内存模型，但它依赖于编译器生成的操作顺序。因此，即使编译器可以重新排序两条指令，但选择不这样做，您将无法使用 Relacy 识别它。

独立的读-修改-写顺序

Independent Read-Modify-Write Ordering

c++

atomic

memory-model

memory-barriers

stdatomic