"release sequence" 是什么意思？

Question

我不明白，如果我们在下面的例子中有 2 个线程，为什么没有 release sequence 会出现问题。我们对原子变量 count 只有 2 个操作。 count 如输出所示依次递减。

来自 C++ 并发实战 作者 Antony Williams:

I mentioned that you could get a synchronizes-with relationship between a store to an atomic variable and a load of that atomic variable from another thread, even when there’s a sequence of read-modify-write operations between the store and the load, provided all the operations are suitably tagged. If the store is tagged with memory_order_release, memory_order_acq_rel, or memory_order_seq_cst, and the load is tagged with memory_order_consume, memory_order_acquire, or memory_order_seq_cst, and each operation in the chain loads the value written by the previous operation, then the chain of operations constitutes a release sequence and the initial store synchronizes-with (for memory_order_acquire or memory_order_seq_cst) or is dependency-ordered-before (for memory_order_consume) the final load. Any atomic read-modify-write operations in the chain can have any memory ordering (even memory_order_relaxed).

To see what this means (release sequence) and why it’s important, consider an atomic<int> being used as a count of the number of items in a shared queue, as in the following listing.

One way to handle things would be to have the thread that’s producingthe data store the items in a shared buffer and then do count.store(number_of_items, memory_order_release) #1 to let the other threads know that data is available. The threads consuming the queue items might then do count.fetch_sub(1,memory_ order_acquire) #2 to claim an item from the queue, prior to actually reading the shared buffer #4. Once the count becomes zero, there are no more items, and the thread must wait #3.

#include <atomic>
#include <thread>
#include <vector>
#include <iostream>
#include <mutex>

std::vector<int> queue_data;
std::atomic<int> count;
std::mutex m;
void process(int i)
{

    std::lock_guard<std::mutex> lock(m);
    std::cout << "id " << std::this_thread::get_id() << ": " << i << std::endl;
}


void populate_queue()
{
    unsigned const number_of_items = 20;
    queue_data.clear();
    for (unsigned i = 0;i<number_of_items;++i)
    {
        queue_data.push_back(i);
    }

    count.store(number_of_items, std::memory_order_release); //#1 The initial store
}

void consume_queue_items()
{
    while (true)
    {
        int item_index;
        if ((item_index = count.fetch_sub(1, std::memory_order_acquire)) <= 0) //#2 An RMW operation
        {
            std::this_thread::sleep_for(std::chrono::milliseconds(500)); //#3
            continue;
        }
        process(queue_data[item_index - 1]); //#4 Reading queue_data is safe
    }
}

int main()
{
    std::thread a(populate_queue);
    std::thread b(consume_queue_items);
    std::thread c(consume_queue_items);
    a.join();
    b.join();
    c.join();
}

输出（VS2015）：

id 6836: 19
id 6836: 18
id 6836: 17
id 6836: 16
id 6836: 14
id 6836: 13
id 6836: 12
id 6836: 11
id 6836: 10
id 6836: 9
id 6836: 8
id 13740: 15
id 13740: 6
id 13740: 5
id 13740: 4
id 13740: 3
id 13740: 2
id 13740: 1
id 13740: 0
id 6836: 7

If there’s one consumer thread, this is fine; the fetch_sub() is a read, with memory_order_acquire semantics, and the store had memory_order_release semantics, so the store synchronizes-with the load and the thread can read the item from the buffer.

If there are two threads reading, the second fetch_sub() will see the value written by the first and not the value written by the store. Without the rule about the release sequence, this second thread wouldn’t have a happens-before relationship with the first thread, and it wouldn’t be safe to read the shared buffer unless the first fetch_sub() also had memory_order_release semantics, which would introduce unnecessary synchronization between the two consumer threads. Without the release sequence rule or memory_order_release on the fetch_sub operations, there would be nothing to require that the stores to the queue_data were visible to the second consumer, and you would have a data race.

他是什么意思？两个线程应该看到 count 的值是 20？但是在我的输出中 count 在线程中依次递减。

Thankfully, the first fetch_sub() does participate in the release sequence, and so the store() synchronizes-with the second fetch_sub(). There’s still no synchronizes-with relationship between the two consumer threads. This is shown in figure 5.7. The dotted lines in figure 5.7 show the release sequence, and the solid lines show the happens-before relationships

Answer 1

What does he mean? That both threads should see the value of count is 20? But in my output count is sequently decremented in threads.

不，他没有。 count 的所有修改都是原子的，因此两个 reader 线程在给定代码中总是会看到不同的值。

他在谈论释放顺序规则的含义，即当给定线程执行 release 存储时，其他多个线程随后执行 acquire 相同位置的负载形成一个 释放序列 ，其中每个后续 acquire 负载与 happens-before 关系存储线程（即存储完成 发生在 加载之前）。这意味着 reader 线程中的加载操作是与 writer 线程的同步点，并且 writer 中存储之前的所有内存操作必须在其相应加载时完成并在 reader 中可见完成。

他是说如果没有这个规则，那么只有第一个线程会同步到作者。因此，第二个线程在访问 queue 时会发生数据竞争（注意： 而不是 count，无论如何它都受到原子访问的保护）。理论上，count 上的 store 之前发生的数据内存操作只能在 count 上的 reader 线程号 2 看到。发布顺序规则确保不会发生这种情况。

总结：发布顺序规则确保多个线程可以在单个存储上同步它们的负载。所讨论的同步是对数据的内存访问 other 而不是被同步的实际原子变量（由于是原子的，无论如何都保证同步）。

要在此处添加的注意事项：在大多数情况下，此类问题仅在 CPU 对重新排序其内存操作很宽松的体系结构上才值得关注。 Intel 架构不是其中之一：它是 强排序的 并且只有少数非常特殊的情况可以对内存操作进行重新排序。这些细微差别大多只在谈论其他架构时才有意义，例如 ARM 和 PowerPC。

Answer 2

fetch_sub 是读-修改-写操作。它以原子方式从内存地址读取值，按提供的参数递减它，然后将其写回内存地址。这一切都是自动发生的。

现在，每个原子操作都直接读取和写入内存地址。 CPU 不依赖缓存在寄存器或缓存行中的值来提高性能。它直接读取和写入内存地址，并防止其他 CPU 在那段时间这样做。

"plain" (==relaxed) 原子性不提供的是重新排序。编译器和 CPU 都为了加快程序的执行速度而进行了读写。

看下面的例子：

atomic integer i
regular integer j

Thread A:
i <- 5
//do something else
i -> j
//make some decisions regarding j value.

Thread B:
i++

如果没有提供内存顺序，则允许编译器和 CPU 将代码转换为

Thread A:
i -> j
i <- 5
//do something else
//make some decisions regarding j value.

Thread B:
i++

这当然不是我们想要的。决策错误。

我们需要的是内存重新排序。

内存顺序获取：不要在
之前打乱内存访问内存顺序释放：不要打乱内存访问 after

回到问题：

fetch_sub既是读一个值，也是写一个值。通过指定 memory order acquire 我们说 "I only care about the order of actions the happened before the reading"
通过指定 memory order release 我们说“我只关心在写作之后发生的动作顺序。

但你确实关心前后的内存访问！

如果你只有一个消费者线程，那么 sub_fetch 不会影响任何人，因为生产者无论如何都使用普通的 store 并且 fetch_sub 的影响只对消费者可见调用 fetch_sub 的线程。在这种情况下，您只关心阅读 - 阅读为您提供当前和更新的索引。存储更新后的索引（假设 x-1）后发生的事情并不那么重要。

但是由于有两个线程读取和写入 counter 重要的是线程 A 会知道线程 B 向计数器写入了 一个新值，并且线程 B 知道线程 A 正在读取计数器的值。反之亦然——线程 B 必须知道线程 A 向 counter 写入了一个新值 counter 并且线程 A 必须知道线程 B 即将读取来自计数器的值

您需要两种保证 - 每个线程都声明它即将读取和写入共享计数器。您需要的内存顺序是 std::memory_order_acquire_release。

但是这个例子很棘手。生产者线程只是在 counter 中存储一个新值，而不管之前的值是什么 。如果生产者线程每次推送新项目时都要 incremenet 计数器 - 你必须在生产者和中使用 std::memory_order_acquire_release消费者线程 即使你只有一个消费者

Answer 3

我无意中遇到了和你一样的问题。我以为我的理解是正确的，然后他提出了这个例子并且只使用了 std::memory_order_aquire。很难找到任何关于此的有用信息，但最终我找到了一些有用的资源。我不知道的主要信息是一个简单的事实，即读-修改-写操作总是对 newest/latest 值起作用，无论给出什么内存顺序（甚至 std::memory_order_relaxed）。这确保您不会在示例中两次使用相同的索引。操作的顺序仍然会混淆（所以你不知道哪个 fetch_sub 会先于另一个发生）。

这是 anthony williams 自己的回答，他说读-修改-写操作总是对最新值有效：Concurrency: Atomic and volatile in C++11 memory model

此外，有人询问了 fetch_sub 与 shared_ptr 引用计数的结合。在这里，安东尼·威廉姆斯也做出了回应，并通过 fetch_sub 的重新排序使情况变得清晰： https://groups.google.com/a/isocpp.org/forum/#!topic/std-discussion/OHv-oNSuJuk

Answer 4

这意味着初始存储与最终加载同步，即使最终加载读取的值与开始时存储的值不直接相同，但它是由可能竞争的原子指令之一修改的值。一个更简单的例子，假设有三个线程竞速执行这些指令（假设 x 在竞速前初始化为 0）

// Thread 1:
A;
x.store(2, memory_order_release);

// Thread 2:
B;
int n = x.fetch_add(1, memory_order_relaxed);
C;

// Thread 3:
int m = x.load(memory_order_acquire);
D;

根据比赛的可能结果，n和m的可能值是多少？根据我们在 m 和 n? 对于 n，我们有两种情况，0 或 2。对于 m，我们可以读取 0、1、2 和 3。两者有六种有效组合。让我们看看每个案例：

m = 0, n = 0。我们没有任何 synchronizes-with 关系，因此我们无法推断出任何 happens-before 关系，除了明显的 B 先于 C
m = 0, n = 2。即使 fetch_add 操作读取了 store 写入的值，因为 fetch_add 有一个 relaxed 内存排序，所以没有 synchronizes-with两条指令的关系。我们不能说 A happens-before C
m = 1, n = 0。与以前类似，由于 fetch_add 没有 release 语义，我们无法推断出 fetch_add 和 同步与 之间的关系load 操作，因此我们不知道是否 B happens-before D
m = 2, n = 0。我们用 acquire 语义 load 读取的值已经用 release 语义 store 写入。我们保证 store 与同步 load，因此 A 发生在 D
m = 2, n = 2。与上面相同，store 与同步 load，因此 A happens-before D。与往常一样，从 fetch_add 读取的值与从线程 1 读取的值 stored 相同这一事实并不意味着任何同步关系。
m = 3, n = 2。此时load读取的数据已经被fetch_add写入，fetch_add读取的数据已经被store写入。然而，由于 fetch_add 具有 relaxed 语义，因此可以假定 store 和 fetch_add 之间以及 fetch_add 和 load 之间没有同步。显然，在这种情况下，可以假定没有同步，与 m = 0, n = 0 的情况相同。这里是 release sequence 概念派上用场的地方：线程 1 中的 release 语义 store 将 synchronize-with线程 3 中的 acquire 语义 load 只要正在读取的值已写入 release sequence，其中包括
1. 稍后在与释放操作相同的线程中执行的所有存储
2. 从同一释放序列读取值的所有原子读取-修改-写入操作。
在这种情况下，因为 fetch_add 是一个原子读取-修改-写入操作，我们知道线程 1 中的 store 与 同步 load 在线程 3 中，因此 A 发生在 D 之前。不过，我们仍然无法确定 B 和 C 的顺序。

在你的情况下你有这个伪代码，假设 number_of_items = 2:

// Thread 1
Item[0] = ...;
Item[1] = ...;
count.store(2,memory_order_release);

// Thread 2
int i2 = 0;
while (i2 = count.fetch_sub(1,memory_order_acquire) <= 0 ) sleep();
auto x2 = Item[i2-1];
process(x2);

// Thread 3
int i3 = 0;
while (i3 = count.fetch_sub(1,memory_order_acquire) <= 0 ) sleep();
auto x3 = Item[i3-1];
process(x3);

假设读入i2的第一个正值是2，那么读入i3的第一个正值是1。由于从线程 2 读取的值已从线程 1 中的存储写入，存储 与负载同步 ，并且我们知道 Item[1] = ...; 来自线程 1 happens-before auto x2 = Item[1]; 在线程 2 中。但是从线程 3 读取的值 1 已被线程 2 写入，其中 fetch_sub 没有 release 语义。因此，来自线程 2 的 fetch_sub 不会 与来自线程 3 的 fetch_sub 同步，但是由于来自线程 2 的 fetch_sub 是线程1中store的释放链，线程1中的store也同步fetch_sub 在线程 3 中，从中我们知道 Item[0] = ...; 发生在 auto x3 = Item[0];

之前

"release sequence" 是什么意思？

What does "release sequence" mean?

c++

multithreading

memory-model

c++11

stdatomic