fetch_add 具有 acq_rel 内存顺序

fetch_add with acq_rel memory order

考虑一个

std::atomic<int> x(0);

假设我有一个函数执行以下操作:

int x_old = x.fetch_add(1,std::memory_order_acq_rel);

基于description for acquire release memory ordering

memory_order_relaxed Relaxed operation: there are no synchronization or ordering constraints, only atomicity is required of this operation (see Relaxed ordering below)

memory_order_consume A load operation with this memory order performs a consume operation on the affected memory location: no reads or writes in the current thread dependent on the value currently loaded can be reordered before this load. Writes to data-dependent variables in other threads that release the same atomic variable are visible in the current thread. On most platforms, this affects compiler optimizations only (see Release-Consume ordering below)

memory_order_acquire A load operation with this memory order performs the acquire operation on the affected memory location: no reads or writes in the current thread can be reordered before this load. All writes in other threads that release the same atomic variable are visible in the current thread (see Release-Acquire ordering below)

memory_order_release A store operation with this memory order performs the release operation: no reads or writes in the current thread can be reordered after this store. All writes in the current thread are visible in other threads that acquire the same atomic variable (see Release-Acquire ordering below) and writes that carry a dependency into the atomic variable become visible in other threads that consume the same atomic (see Release-Consume ordering below).

memory_order_acq_rel A read-modify-write operation with this memory order is both an acquire operation and a release operation. No memory reads or writes in the current thread can be reordered before or after this store. All writes in other threads that release the same atomic variable are visible before the modification and the modification is visible in other threads that acquire the same atomic variable.

memory_order_seq_cst Any operation with this memory order is both an acquire operation and a release operation, plus a single total order exists in which all threads observe all modifications in the same order (see Sequentially-consistent ordering below)

2 个不同的线程是否可以接收相同的 x_old 值 0?或者它们是否保证以 x_old 只有其中一个为 0 而另一个为 1 的方式执行。

如果 x_old 对它们都为 0 是真的,将内存顺序更改为 std::memory_order_seq_cst 是否保证 x_old 的唯一性?

Is it possible for 2 distinct threads to receive the same x_old value of 0?

不可能,因为操作是原子。它要么全部发生,要么根本不发生。

排序与 preceding/following loads/stores 相关,由于您没有,因此此处的排序无关紧要。换句话说,x.fetch_add(1, std::memory_order_relaxed);在这里有同样的效果。

在当前的 x86 上,无论 memory_order 是相同的 lock xadd 指令,lock 前缀同时提供原子性和顺序。对于 memory_order_relaxedlock 的排序部分是不必要的。

对内存执行的任何操作都在处理器内部完成。即使是原子操作,处理器也会读取、修改并写回新值。如果操作失败(取决于实现,它可能不会失败,而是会阻塞),它会自己重复。如果成功,为了使操作正确,新值必须是前一个值,根据请求修改并存储。修改后的值 returned 给了用户。处理器没有理由再次从内存中读取 return 随机时间的值。如果值 returned 不是紧邻的前一个,则结果操作将不正确。

您可以使用类似这样的方法对其进行测试:

long repeats = 1000000000;
long x = 0;
long sum = 0;
void *test_func(void*arg){
    long local_sum = 0;
    for (int i = 0; i < repeats; ++i) {
        local_sum += atomic_fetch_add_explicit(&x, 1, memory_order_relaxed);
    }
    atomic_fetch_add(&sum, local_sum);
    return NULL;
}

如果结果与顺序执行相同,则一切正常。

    long correct_res = 0;
    for (int i = 0; i < repeats * no_threads; ++i) {
        correct_res = correct_res + i;
    }

完整代码:

#include <stdio.h>
#include <pthread.h>
#include <stdatomic.h>

long repeats = 1000000000;
long x = 0;
long sum = 0;
void *test_func(void*arg){
    long local_sum = 0;
    for (int i = 0; i < repeats; ++i) {
        local_sum += atomic_fetch_add_explicit(&x, 1, memory_order_relaxed);
    }
    atomic_fetch_add(&sum, local_sum);
    return NULL;
}

int main() {
    long correct_res = 0;
    for (int i = 0; i < repeats * 2; ++i) {
        correct_res = correct_res + i;
    }
    pthread_t pthread[2];
    pthread_create(&pthread[0], NULL, test_func, NULL);
    pthread_create(&pthread[1], NULL, test_func, NULL);

    pthread_join(pthread[0], NULL);
    pthread_join(pthread[1], NULL);
    printf("correct res : %ld\n res : %ld\n", correct_res, sum);
    if(correct_res == sum)
        printf("Success.\n");
    else
        printf("Failure.\n");
    return 0;
}