具有内存排序的 C++ 原子增量

C++ atomic increment with memory ordering

在我读完 C++ concurrency in action 第 5 章后,我试着写了一些代码来测试我对内存排序的理解:

#include <iostream>
#include <vector>
#include <thread>
#include <atomic>

std::atomic<int> one,two,three,sync;

void func(int i){
    while(i != sync.load(std::memory_order_acquire));
    auto on = one.load(std::memory_order_relaxed); ++on;
    auto tw = two.load(std::memory_order_relaxed); ++tw;
    auto th = three.load(std::memory_order_relaxed); ++th;
    std::cout << on << tw << th << std::endl;
    one.store(on,std::memory_order_relaxed);
    two.store(tw,std::memory_order_relaxed);
    three.store(th,std::memory_order_relaxed);
    int expected = i;
    while(!sync.compare_exchange_strong(expected,i+1,
            std::memory_order_acq_rel))
        expected = i;
}

int main(){
    std::vector<std::thread> t_vec;
    for(auto i = 0; i != 5; ++i)
        t_vec.push_back(std::thread(func,i));
    for(auto i = 0; i != 5; ++i)
        t_vec[i].join();
    std::cout << one << std::endl;
    std::cout << two << std::endl;
    std::cout << three << std::endl;
    return 0;
}

我的问题是:书上说memory_order_release和memory_order_acquire应该是一对才能正确读出正确的值。

因此,如果 func() 的第一行在 memory_order_acquire 循环中加载同步,它应该会破坏该对并在同步时产生不可预测的错误。

然而,正如预期的那样,它在我的 x86 平台上编译后打印:

111
222
333
444
555
5
5
5

结果显示没有问题。所以我只是想知道 func() 中发生了什么(虽然我自己写的...)?

已添加:根据第 141 页关于 C++ 并发的代码:

#include <atomic>
#include <thread>

std::vector<int> queue_code;
std::atomic<int> count;

void populate_queue(){
    unsigned const number_of_items = 20;
    queue_data.clear();
    for(unsigned i = 0; i < number_of_items; ++i)
        queue_data.push_back(i);
    count.store(number_of_items, std::memory_order_release);
}

void consume_queue_items(){
    while(true){
        int item_index;
        if((item_index=count.fetch_sub(1,memory_order_acquire))<=0){
            wait_for_more_items();
            continue;
        }
        process(queue_data[item_index-1]);
    }
}

int main(){
    std::thread a(populate_queue);
    std::thread b(consume_queue_items);
    std::thread c(consume_queue_items);
    a.join();
    b.join();
    c.join();
}

无论谁先访问,线程 b 和线程 c 都可以正常工作。因为:

Thankfully, the first fetch_sub() does participate in the release sequence, and so the store() synchronizes-with the second fetch_sub(). There's still no synchronizes-with relationship between the two consumer threads There can be any number of links in the chain, but provided they're all read-modify-write operation such as fetch_sub(), the store() will still synchronize-with each one that's tagged memory_order_acquire.In this example, all the links are the same, and all are acquire operations, but they could be a mix of different operations with different memory_ordering semantics.

但是我找不到这方面的相关资料,以及 fetch_sub() 等读-修改-写操作如何参与发布序列?如果我将其更改为使用 memory_order_acquire 加载,store() 是否仍会在每个独立线程中与 load() 同步?

您的代码显示了一个基本的自旋锁互斥体,它让每个线程通过识别自己的值而不是更改状态来隐式地获取锁。

内存排序是正确的,甚至比技术上需要的更强大。 底部的 compare_exchange_strong 不是必须的;带有释放屏障的普通 store 就足够了:

sync.store(i+1, std::memory_order_release);

可以对宽松的操作进行重新排序,但不会更改程序的输出。没有未定义的行为,所有平台都保证相同的输出。
事实上,onetwothree 甚至不必是原子的,因为它们仅在您的自旋锁互斥体中以及在所有线程都加入后才被访问。

So if the first line of func() is load sync within a loop with memory_order_acquire, it should break the pair and make an unpredictable error on synchronization.

Acquire/release 配对是正确的,因为底部的释放屏障(在线程 X 中)与顶部的获取屏障(在线程 Y 中)配对。 第一个线程在没有先前释放的情况下获取很好,因为还没有要释放的东西。

关于“已添加”部分:

How read-modify-write operation such as fetch_sub() participate in the release sequence?

这是标准在 1.10.1-5 中所说的:

A release sequence headed by a release operation A on an atomic object M is a maximal contiguous subsequence of side effects in the modification order of M, where the first operation is A, and every subsequent operation:

  • is performed by the same thread that performed A, or
  • is an atomic read-modify-write operation.

因此,为了将数据释放到另一个处理器,load/acquire 操作需要观察释放操作存储的值或以后的值,只要它满足这些要求之一。
显然,读取-修改-写入操作具有额外的属性,可以防止对原子变量的更新以不太明确的顺序到达其他处理器。

If I change it to load with memory_order_acquire, will store() still synchronizes-with load() in each independent thread?

如果将读取-修改-写入更改为单独的 load/acquire(即看到更新后的值)和 store/release,它仍然是正确的,但它不再是相同的释放顺序; 您已经创建了一个单独的发布序列。