.NET TPL 数据流源中的线程安全

Question

出于好奇，我正在查看 .NET TPL "Dataflow" 库的某些部分的实现，我发现了以下代码片段：

    private void GetHeadTailPositions(out Segment head, out Segment tail,
        out int headLow, out int tailHigh)
    {
        head = _head;
        tail = _tail;
        headLow = head.Low;
        tailHigh = tail.High;
        SpinWait spin = new SpinWait();

        //we loop until the observed values are stable and sensible.  
        //This ensures that any update order by other methods can be tolerated.
        while (
            //if head and tail changed, retry
            head != _head || tail != _tail
            //if low and high pointers, retry
            || headLow != head.Low || tailHigh != tail.High
            //if head jumps ahead of tail because of concurrent grow and dequeue, retry
            || head._index > tail._index)
        {
            spin.SpinOnce();
            head = _head;
            tail = _tail;
            headLow = head.Low;
            tailHigh = tail.High;
        }
    }

（可在此处查看：https://github.com/dotnet/corefx/blob/master/src/System.Threading.Tasks.Dataflow/src/Internal/ConcurrentQueue.cs#L345）

根据我对线程安全的理解，这个操作容易出现数据竞争。我将解释我的理解，然后是我认为的'error'。当然，我认为这更有可能是我心智模型的错误，而不是图书馆的错误，我希望这里有人能指出我哪里出错了。

...

所有给定字段（head、tail、head.Low 和 tail.High）都是可变的。据我了解，这提供了两个保证：

每次读取全部四个字段，必须按顺序读取
编译器可能不会忽略任何读取，CLR/JIT 必须采取措施防止 'caching' 的值

根据我对给定方法的了解，发生了以下情况：

初始读取 ConcurrentQueue 的内部状态（即 head、tail、head.Low 和 tail.High）。
执行了一次忙等待自旋
该方法然后再次读取内部状态并检查任何更改
如果状态发生变化，转到步骤2，重复
Return一旦被认为是读取状态'stable'

现在假设这一切都是正确的，我的 "problem" 是这样的：上面的状态读取不是原子的。我没有看到任何阻止读取半写状态的内容（例如，写入器线程已更新 head 但尚未 tail）。

现在我有点意识到像这样的缓冲区中的半写状态并不是世界末日 - 毕竟 head 和 tail 指针完全可以updated/read 独立，通常在 CAS/spin 循环中。

但后来我真的不明白旋转一次然后再阅读的意义是什么。您真的要 'catch' 在进行一次旋转所需的时间内进行更改吗？它试图 'guard' 反对什么？换句话说：如果整个状态读取是原子的，我不认为该方法有任何帮助，如果没有，那么 是什么 方法在做什么？

Answer 1

你是对的，但请注意来自 GetHeadTailPositions 的输出值后来用作 ToList、Count 和 GetEnumerator 的快照。

更令人担忧的是并发队列might hold on to values indefinitely。当私有字段 ConcurrentQueue<T>._numSnapshotTakers 不为零时，它会阻止清空条目或将它们设置为值类型的默认值。

Stephen Toub 在 ConcurrentQueue<T> holding on to a few dequeued elements 中发表了关于此的博客：

For better or worse, this behavior in .NET 4 is actually “by design.” The reason for this has to do with enumeration semantics. ConcurrentQueue<T> provides “snapshot semantics” for enumeration, meaning that the instant you start enumerating, ConcurrentQueue<T> captures the current head and tail of what’s currently in the queue, and even if those elements are dequeued after the capture or if new elements are enqueued after the capture, the enumeration will still return all of and only what was in the queue at the time the enumeration began. If elements in the segments were to be nulled out when they were dequeued, that would impact the veracity of these enumerations.

For .NET 4.5, we’ve changed the design to strike what we believe to be a good balance. Dequeued elements are now nulled out as they’re dequeued, unless there’s a concurrent enumeration happening, in which case the element isn’t nulled out and the same behavior as in .NET 4 is exhibited. So, if you never enumerate your ConcurrentQueue<T>, dequeues will result in the queue immediately dropping its reference to the dequeued element. Only if when the dequeue is issued someone happens to be enumerating the queue (i.e. having called GetEnumerator on the queue and not having traversed the enumerator or disposed of it yet) will the null’ing out not happen; as with .NET 4, at that point the reference will remain until the containing segment is removed.

从源代码中可以看出，获取枚举器（通过泛型 GetEnumerator<T> 或非泛型 GetEnumerator），调用 ToList（或 ToArray 使用 ToList) 或 TryPeek 可能会导致即使在删除项目后仍保留引用。诚然，TryDequeue（调用 ConcurrentQueue<T>.Segment.TryRemove）和 TryPeek 之间的竞争条件可能很难引发，但它确实存在。

.NET TPL 数据流源中的线程安全

Thread Safety in .NET TPL Dataflow Source

c#

parallel-processing

multithreading

task-parallel-library

tpl-dataflow