C++。加权 std::shuffle

C++. Weighted std::shuffle

有没有办法使用标准库进行漂亮优雅的加权洗牌? 有std::discrete_distribution。 我想要的是这样的:

std::vector<T> data { N elements };
std::vector<int> weights { N weights };
std::shuffle(std::begin(data), std::end(data), something based on discrete distribution);

如果 OP 意图是随机排列 r 个项目

such that, given a list of weights w, the element a[i] with weight w[i] should be the first element of the random shuffle r with probability w[i]/sum(w).

page linked by Severin Pappadeux 中所述:

Weighted random shuffling is the same as weighted random sampling from a list a without replacement. That is, choose with probability w[i]/sum(w) element a[i] from a. Store this element in a list r. Then, remove element a[i] from a and w[i] from w, and select a new element of the modified list a, and so on until a is empty.

我不知道标准库中有这样的算法,但一个简单的实现可能是:

#include <random>
#include <algorithm>
#include <iterator>

template <class D, class W, class URBG>
void weighted_shuffle
    ( D first, D last
    , W first_weight, W last_weight
    , URBG&& g )
{
    while (first != last and first_weight != last_weight)
    {
        std::discrete_distribution dd(first_weight, last_weight);
        auto i = dd(g);
        if ( i )
        {
            std::iter_swap(first, std::next(first, i));
            std::iter_swap(first_weight, std::next(first_weight, i));
        }
        ++first;
        ++first_weight;
    }
}

实例HERE

查看 Weighted Random Sampling (2005; Efraimidis, Spirakis)。如果您创建一个值列表 -pow(rand(0,1), weights[i]) 并对其进行排序,您将得到您想要的。

等效地(并且速度稍快),可以使用指数分布创建此列表。

std::vector<size_t> weighted_shuffle(std::vector<double> const &weights, std::mt19937 &rng)
{
    //auto uniform_dist = std::uniform_real_distribution<double>();
    auto exp_dist = std::exponential_distribution<double>();

    std::vector<std::pair<double, size_t>> index_pairs;
    index_pairs.reserve(weights.size());
    for (size_t i=0; i<weights.size(); ++i) {
        double const p = weights[i];
        // from Efraimidis, Spirakis
        //index_pairs.emplace_back(-std::pow(uniform_dist(rng), 1.0/p), i);
        // equivalent and a bit faster
        //index_pairs.emplace_back(-std::log(uniform_dist(rng))/p, i);
        // equivalent and fastest
        index_pairs.emplace_back(exp_dist(rng)/p, i);
    }
    std::sort(index_pairs.begin(), index_pairs.end());

    std::vector<size_t> indices;
    indices.reserve(weights.size());
    for (auto const &[w, i] : index_pairs)
        indices.emplace_back(i);
    return indices;
}