一把 readers/writer 锁……没有读者锁？

Question

我觉得这可能是一种非常普遍和常见的情况，存在众所周知的无锁解决方案。

简而言之，我希望有像 readers/writer 锁这样的方法，但这不需要 reader 获取锁，因此可以获得更好的平均性能。

取而代之的是 reader 的一些原子操作（128 位 CAS）和一个 writer 的互斥量。我将拥有数据结构的两个副本，一个只读的副本用于正常成功的查询，另一个相同的副本在互斥锁保护下进行更新。将数据插入可写副本后，我们将其设为新的可读副本。然后依次插入旧的可读副本，一旦所有未决的 readers 都读完它，编写器旋转剩余的 readers 的数量直到其为零，然后依次修改它，最后释放互斥量。

或者类似的东西。

是否存在类似的情况？

Answer 1

如果您的数据适合 64 位值，大多数系统可以便宜地 read/write 原子地做到这一点，因此只需使用 std::atomic<my_struct>.

对于较小的 and/or 不经常写入的数据，有几种方法可以使 reader 在共享数据上真正只读，而不是必须在共享计数器或任何东西上执行任何原子 RMW 操作。这允许读取端扩展到多个线程，而不会 readers 相互竞争（不像 x86 上使用 lock cmpxchg16b¹ 的 128 位原子读取，或采用一个 RWlock).

理想情况下只是通过atomic<T*>指针（RCU）的额外间接级别，或者只是额外的加载+比较和分支（SeqLock）；没有原子 RMW 或内存屏障强于 acq/rel 或读取端的任何其他内容。

这可能适用于许多线程非常频繁读取的数据，例如由定时器中断更新但在整个地方读取的时间戳。或通常永远不会更改的配置设置。

如果您的数据较大 and/or 更改更频繁，其他答案 中建议的策略之一需要 reader 仍然采用 RWlock在某事上或原子地增加一个计数器会更合适。这不会完美扩展，因为每个 reader 仍然需要获得包含锁或计数器的共享缓存行的独占所有权，以便它可以修改它，但天下没有免费的午餐。

注 1：更新：x86 供应商最终决定保证 128 位 SSE/AVX 加载/存储在 CPU 上与 AVX 是原子的，因此，如果幸运的话，std::atomic<16-byte-struct> 在启用 AVX 的 CPU 上运行时负载便宜。例如不是 Pentium/Celeron 在冰湖之前。一段时间以来，GCC 一直在为 16 字节操作间接指向 libgcc atomic_load_16 函数，因此运行时调度它可以在支持它的 CPU 上选择 lock cmpxchg16b 策略。现在它在某些 CPU 上有更好的选择。

RCU

听起来 你已经完成了发明 RCU（阅读副本更新）的一半，你更新了指向新版本的指针。

但请记住，无锁 reader 可能会在加载指针后停止，因此您遇到了释放问题。这是 RCU 的难点。在内核中，可以通过设置同步点来解决，您知道没有 readers 早于某个时间 t，因此可以释放旧版本。有一些 user-space 实现。 https://en.wikipedia.org/wiki/Read-copy-update and https://lwn.net/Articles/262464/.

对于 RCU，更改的频率越低，可以证明复制的数据结构越大。 例如即使是一个中等大小的树也是可行的，如果它只是由管理员交互地改变的话，而 reader 是运行在几十个内核上并行检查一些东西。例如内核配置设置是 RCU 在 Linux.

中表现出色的一件事

序列锁

如果您的数据很小（例如 32 位机器上的 64 位时间戳），另一个不错的选择是 SeqLock。读者将序列计数器 before/after 数据的非原子副本检查到私有缓冲区中。如果序列计数器匹配，我们就知道没有撕裂。（作者用单独的互斥体互相排斥）。 / how to implement a seqlock lock using c++11 atomic library.

在 C++ 中编写一些可以有效编译为可能有撕裂的非原子副本的东西有点 hack，因为这不可避免地是数据争用 UB。（除非你对每个块分别使用 std::atomic<long> 和 mo_relaxed，但是这样你就无法使用 movdqu 或其他东西一次复制 16 个字节。）

SeqLock 使 reader 每次读取都复制整个东西（或者理想情况下只是将其加载到寄存器中），因此它只适用于小型结构或 128 位整数或其他东西. 但是对于小于 64 字节的数据，它可能非常好，比让 readers 使用 lock cmpxchg16b 作为 128 位数据更好，如果你有很多 reader s 和不频繁的写入。

虽然它不是无锁的：在修改 SeqLock 时休眠的编写器可能会 readers 无限期地重试。对于一个小的 SeqLock，window 很小，显然您希望在执行第一个序列计数器更新之前准备好所有数据，以尽量减少中断在更新中暂停编写器的机会。

最好的情况是只有 1 个写入器，因此它不需要进行任何锁定；它不知道其他任何东西都会修改序列计数器。

Answer 2

您描述的内容与 double instance locking and left-right concurrency control 非常相似。

在进度保证上，两者的区别在于前者是读锁免锁，后者是免等待。两者都阻止了作家。

Answer 3

我得到的解决方案：

每个线程都有一份thread_local的数据结构副本，这个可以无锁随意查询。任何时候找到您的数据，太棒了，您就大功告成了。

如果您没有找到您的数据，那么您将获得主副本的互斥量。

这可能会有许多来自其他线程的新插入（可能包括您需要的数据！）。检查它是否有您的数据，如果没有则插入它。

最后，将所有最新更新（包括您需要的数据条目）复制到您自己的 thread_local 副本中。释放互斥量并完成。

读者可以全天候并行阅读，即使更新正在发生，没有锁。只有在写入时（或有时在追赶时）才需要锁。这种通用方法适用于范围广泛的底层数据结构。 QED

如果有很多线程使用这种结构，那么拥有许多 thread_local 索引听起来内存效率很低。

但是索引找到的数据，如果是只读的，只需要有一份，被多个索引引用。（幸运的是，我就是这种情况。）

另外，许多线程可能不会随机访问整个条目范围；也许有些只需要几个条目，并且很快就会达到最终状态，在这种状态下，他们的本地结构副本可以在它增长很多之前找到所有需要的数据。然而，许多其他线程可能根本没有提到这一点。（幸运的是，我就是这种情况。）

最后，对于 "copy all the recent updates" 如果添加到结构中的所有新数据都被推到向量的末尾，那么假设您的本地副本中有 4000 个条目，那么master copy 有 4020，你可以用几个机器周期定位到你需要添加的 20 个对象。（幸运的是，我就是这种情况。）

Answer 4

原来我想的双结构解决方案与http://concurrencyfreaks.blogspot.com/2013/12/left-right-concurrency-control.html

有相似之处

这是我想到的具体数据结构和伪代码。

我们分配了一些名为 MyMap 的任意数据结构的两个副本，还有两个一组三个指针中的指针指向这两个。最初，一个是指向通过 achReadOnly[0].pmap 和另一个通过 pmapMutable.

关于 achReadOnly 的快速说明：它有一个正常状态和两个临时状态。正常状态将是（单元格 0/1 的 WLOG）：

achReadOnly = { { pointer to one data structure, number of current readers },
                { nullptr, 0 } }
pmapMutable = pointer to the other data structure

当我们完成变异后 "the other," 我们将它存储在数组未使用的槽中因为它是下一代只读版，所以读者可以开始访问它。

achReadOnly = { { pointer to one data structure, number of old readers },
                { pointer to the other data structure, number of new readers } }
pmapMutable = pointer to the other data structure

writer然后清除指向"the one"的指针，上一代readonly，强制读者去下一代。我们将其移至 pmapMutable。

achReadOnly = { { nullptr, number of old readers },
                { pointer to the other data structure, number of new readers } }
pmapMutable = pointer to the one data structure

然后作者旋转让一些老读者在这一点上击中一个（他自己）它可以接收相同的更新。该 1 被 0 覆盖以进行清理以准备继续前进。尽管实际上它可能会变脏，因为它在被覆盖之前不会被引用。

struct CountedHandle {
    MyMap*   pmap;
    int      iReaders;
};

// Data Structure:
atomic<CountedHandle> achReadOnly[2];
MyMap* pmapMutable;
mutex_t muxMutable;

data Read( key ) {
    int iWhich = 0;
    CountedHandle chNow, chUpdate;

    // Spin if necessary to update the reader counter on a pmap, and/or
    // to find a pmap (as the pointer will be overwritten with nullptr once
    // a writer has finished updating the mutable copy and made it the next-
    // generation read-only in the other slot of achReadOnly[].

    do {
        chNow = achReadOnly[ iWhich ];
        if ( !chNow .pmap ) {
            iWhich = 1 - iWhich;
            continue;
        }
        chUpdate = chNow;
        chNow.iReaders++;
    } while ( CAS( ach[ iWhich ], chNow, chUpdate ) fails );

    // Now we've found a map, AND registered ourselves as a reader of it atomicly.
    // Importantly, it is impossible any reader has this pointer but isn't
    // represented in that count.

    if ( data = chnow.pmap->Find( key ) ) {
        // Deregister ourselves as a reader.
        do {
            chNow = achReadOnly[ iWhich ];
            chUpdate = chNow;
            chNow.iReaders--;
        } while ( CAS( ach[ iWhich ], chNow, chUpdate ) fails );

        return data;
    }

    // OK, we have to add it to the structure.

    lock muxMutable;
    figure out data for this key
    pmapMutable->Add( key, data );

    // It's now the next-generation read-only.  Put it where readers can find it.
    achReadOnly[ 1 - iWhich ].pmap = pmapMutable;

    // Prev-generation readonly is our Mutable now, though we can't change it
    // until the readers are gone.
    pmapMutable = achReadOnly[ iWhich ].pmap;

    // Force readers to look for the next-generation readonly.
    achReadOnly[ iWhich ].pmap = nullptr;

    // Spin until all readers finish with previous-generation readonly.
    // Remember we added ourselves as reader so wait for 1, not 0.

    while ( achReadOnly[ iWhich ].iReaders > 1 }
        ;

    // Remove our reader count.
    achReadOnly[ iWhich ].iReaders = 0;

    // No more readers for previous-generation readonly, so we can now write to it.
    pmapMutable->Add( key, data );

    unlock muxMutable;

    return data;

}

一把 readers/writer 锁……没有读者锁？

A readers/writer lock... without having a lock for the readers?

c++

concurrency

lock-free

lockless

stdatomic

RCU

序列锁