解释 Michael & Scott 无锁队列算法

Question

我正在研究 Michael & Scott 的无锁队列算法并尝试用 C++ 实现它。

但是我在我的代码中产生了一个竞争，我认为算法中可能存在一个竞争。

我在这里阅读了论文：简单、快速、实用的非阻塞和阻塞并发队列算法而原来的Dequeue伪代码如下：

dequeue(Q: pointer to queue_t, pvalue: pointer to data type): boolean
D1:   loop                          // Keep trying until Dequeue is done
D2:      head = Q->Head             // Read Head
D3:      tail = Q->Tail             // Read Tail
D4:      next = head.ptr->next      // Read Head.ptr->next
D5:      if head == Q->Head         // Are head, tail, and next consistent?
D6:         if head.ptr == tail.ptr // Is queue empty or Tail falling behind?
D7:            if next.ptr == NULL  // Is queue empty?
D8:               return FALSE      // Queue is empty, couldn't dequeue
D9:            endif
                // Tail is falling behind.  Try to advance it
D10:            CAS(&Q->Tail, tail, <next.ptr, tail.count+1>)
D11:         else                    // No need to deal with Tail
               // Read value before CAS
               // Otherwise, another dequeue might free the next node
D12:            *pvalue = next.ptr->value
               // Try to swing Head to the next node
D13:            if CAS(&Q->Head, head, <next.ptr, head.count+1>)
D14:               break             // Dequeue is done.  Exit loop
D15:            endif
D16:         endif
D17:      endif
D18:   endloop
D19:   free(head.ptr)                // It is safe now to free the old node
D20:   return TRUE                   // Queue was not empty, dequeue succeeded

在我看来，比赛是这样的：

线程 1 前进到 D3，然后停止。
线程2进阶到D3，读取与线程1相同的头
线程 2 幸运地一直推进到 D20，在 D19 它释放了 head.ptr
线程 1 继续并前进到 D4，尝试读取 head.ptr->next，但由于 head.ptr 已被线程 1 释放，因此发生崩溃。

而且我的 C++ 代码确实总是在线程 1 的 D4 处崩溃。

谁能指出我的错误并给出一些解释？

Answer 1

谢谢，非常有趣的主题！它看起来确实像一个错误，但该论文的一位作者断言他们的 free() 不是我们都使用的普通 free()，而是一些神奇的 free()，所以没有错误。太棒了。

见http://blog.shealevy.com/2015/04/23/use-after-free-bug-in-maged-m-michael-and-michael-l-scotts-non-blocking-concurrent-queue-algorithm/

希望没有人在没有彻底分析的情况下将其投入生产。

Answer 2

这实际上是自 MS 队列的作者之一 Maged Michael 引入 Hazard Pointers [1] 以来一直在探索的非阻塞内存回收问题。

危险指针允许线程保留块，以便其他线程在它完成之前不会真正回收它们。但是，这种机制会带来不小的性能开销。

还有许多基于 epoch 的回收变体，例如 RCU [2,3]，以及最近的基于间隔的回收 (IBR) [4]。它们通过保留纪元来避免 use-after-free，并且比 Hazard Pointers 更快。据我所知，基于epoch的回收被广泛采用来处理这个问题。

您可以查看下面提到的这些论文以了解更多详细信息。 Interval-Based Memory Reclamation 的论文讨论了很多背景知识。

这是非阻塞数据结构中的普遍问题，我们一般不会将其视为数据结构本身的错误---毕竟它只出现在使用手动内存管理的语言中，如C/C++ 但不是像 Java 这样的（顺便说一句，Michael & Scott Queue 已经在 Java Concurrency 中被采用多年）。

参考：

[1] 危险提示：无锁对象的安全内存回收，Maged M. Michael，IEEE Transactions on Parallel and Distributed Systems，2004。

[2] 无锁同步的内存回收性能，Thomas E. Hart 等人，并行和分布式计算杂志，2007 年。

[3] 阅读副本更新，Paul E. McKenney 等人，渥太华 Linux 研讨会，2002 年。

[4] Interval-Based Memory Reclamation，Haosen Wen 等人，第 23 届 ACM SIGPLAN 并行编程 (PPoPP) 原则与实践研讨会论文集，2018 年。

Answer 3

是的，问题可能在 D19。

正如指出的那样 不是正常的 free().

这在介绍 §1 中有说明

A node is freed only when no pointers in the data structure or temporary variables point to it.

解释 Michael & Scott 无锁队列算法

Explain Michael & Scott lock-free queue alorigthm

queue

concurrency

nonblocking

lock-free

data-structures