Return 半随机排序的加权对象列表

Question

假设我有一个对象列表（在 Python 中）看起来像这样（包含一个标识符和一个 ranking/weighting）：

objects = [
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_3", 0.25),
    ("object_4", 0.01),
    ("object_5", 0.99),
]

我想 return 同样的 objects 数组，但按其权重的半随机顺序排列。也就是说，我并不总是想要return:

[
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

但宁愿允许某些非确定性，因此，一般来说，returned 数组 看起来像 以上但也可能看起来像：

[
    ("object_5", 0.99),
    ("object_1", 0.50),
    ("object_2", 0.75),
    ("object_4", 0.01),
    ("object_3", 0.25),
]

编辑：我认为我问的问题与不同，因为这里的顺序很重要；在另一个问题中，顺序无关紧要（同样，我认为！）。

Answer 1

如果我没记错的话，一种方法是在不放回的情况下对样本进行加权：

from random import choices


def weighted_sample_without_replacement(population, weights, k=1):
    #    
    weights = list(weights)
    positions = range(len(population))
    indices = []
    while True:
        needed = k - len(indices)
        if not needed:
            break
        for i in choices(positions, weights, k=needed):
            if weights[i]:
                weights[i] = 0.0
                indices.append(i)
    return [population[i] for i in indices]


data = [
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

_, weights = zip(*data)
sample = weighted_sample_without_replacement(data, weights, k=len(data))
print(sample)

输出(单个运行)

[('object_2', 0.75), ('object_5', 0.99), ('object_3', 0.25), ('object_1', 0.5), ('object_4', 0.01)]

一项基本的实验分析似乎验证了我的假设：

from collections import defaultdict
from operator import itemgetter

_, weights = zip(*data)
counts = defaultdict(lambda : defaultdict(int))
for _ in range(1000):
    sample = weighted_sample_without_replacement(data, weights, k=len(data))
    for i, (key, _) in enumerate(sample):
        counts[i][key] += 1

for key, values in counts.items():
    print(key, sorted(values.items(), key=itemgetter(1), reverse=True))

输出 （实验）

0 [('object_5', 415), ('object_2', 290), ('object_1', 186), ('object_3', 106), ('object_4', 3)]
1 [('object_2', 322), ('object_5', 309), ('object_1', 241), ('object_3', 119), ('object_4', 9)]
2 [('object_1', 319), ('object_2', 259), ('object_3', 209), ('object_5', 199), ('object_4', 14)]
3 [('object_3', 533), ('object_1', 239), ('object_2', 126), ('object_5', 75), ('object_4', 27)]
4 [('object_4', 947), ('object_3', 33), ('object_1', 15), ('object_2', 3), ('object_5', 2)]

值 'object_5' 在 1000 次中有 724 次位于前两个位置，而 'object_4' 在 1000 次中有 947 次位于最后一个位置。为了更好地可视化结果，请参阅下图（可视化是由额外的运行实验设置生成的）：

可以找到重现实验的代码 here。

Answer 2

如果您能够确保 weight 值始终在 [0, 1) 之间，那么以下代码将起作用！

from random import random


def weighted_sample_without_replacement(
    population: List[Tuple[Any, float]], weights: tuple
) -> List[Tuple[Any, float]]:
    return sorted(population, key=lambda x: x[1] * random())

其中 population 看起来像：

[
    ("object_5", 0.99),
    ("object_2", 0.75),
    ("object_1", 0.50),
    ("object_3", 0.25),
    ("object_4", 0.01),
]

weights 喜欢：

(
    0.99,
    0.75,
    0.50,
    0.25,
    0.01,
)

Return 半随机排序的加权对象列表

Return list of weighted objects with semi-randomized ranking

python

random

list

ranking

weighted