这个危险指针示例是否因为 ABA 问题而存在缺陷?

Is this hazard pointer example flawed because of ABA issue?

作者在C++ Concurrency in Action一书中给出了使用冒险指针实现无锁栈数据结构的例子。部分代码如下:

std::shared_ptr<T> pop()
{
    std::atomic<void*>& hp=get_hazard_pointer_for_current_thread();
    node* old_head=head.load();
    node* temp;
    do
    {
        temp=old_head;
        hp.store(old_head);
        old_head=head.load();
    } while(old_head!=temp);
    // ...
}

描述说

You have to do this in a while loop to ensure that the node hasn’t been deleted between the reading of the old head pointer and the setting of the hazard pointer. During this window no other thread knows you’re accessing this particular node. Fortunately, if the old head node is going to be deleted, head itself must have changed, so you can check this and keep looping until you know that the head pointer still has the same value you set your hazard pointer to.

我认为代码有缺陷,因为 head 节点受制于 ABA problem。即使head的值保持不变,它原来指向的节点也可能已经被删除了。分配了一个新的 head 节点,恰好与前一个节点具有相同的地址值。

load() 操作的默认 memory_orderstd::memory_order_seq_cst,这确保了所有操作的顺序一致性(全局总排序):

Each memory_order_seq_cst operation B that loads from atomic variable M, observes one of the following:

  • the result of the last operation A that modified M, which appears before B in the single total order
  • OR, if there was such an A, B may observe the result of some modification on M that is not memory_order_seq_cst and does not happen-before A
  • OR, if there wasn't such an A, B may observe the result of some unrelated modification of M that is not memory_order_seq_cst.

因此,如果节点被修改(删除)并且这发生在第二次读取全局总顺序之前,您肯定会看到该更改,因此循环将继续执行。如果在之后进行此修改,则不会有任何危害,因为已经设置了危险指针。

你有这个保证,因为存储到危险指针也是用 std::memory_order_seq_cst 完成的。此内存顺序为您提供 acquire 加载操作和 release 存储操作,防止在同一线程内重新排序。因此,"successful" read (old_head==temp) 保证保存了正确的数据。

将这两个负载视为同步点 - 因为它们执行 acquire 操作,所以它们与相应的 release 操作同步这些值,导致所有写入变得可见。


您描述的问题不会以任何方式影响示例。 pop() 函数旨在删除顶部元素,它会做到这一点。如果在此期间,元素是 added/removed,它将弹出它,无论它的地址是什么(它甚至可能与之前获取的地址相同)。这是一个完全不同的问题。考虑:

concurrent_stack<int> p;
if (!p.empty() && (p.top() == 5))
{
  auto t = p.pop();
  assert( t ); // May fail
  assert( *t == 5 ); // May fail
}

两个断言都可能失败,如果许多线程非常密集地使用堆栈,很可能会经常失败。但这不是由于 pop() 的错误实现,而是您需要更强的访问限制以确保确实从堆栈中删除最后检查的元素。