如何 return 元素的索引，元素值除以数组总和的概率

Question

给定一个数组和一个值 k，写一个函数到 return 元素的索引，等于 k 的概率为 k/sum（输入数组）。假设输入数组中没有重复的数字。

例如，如果输入数组是1,4,2,3。该函数应具有以下行为：

return 0 概率为 1/10；

return 1 概率为 4/10；

return 2 概率为 2/10；

return 3 概率为 3/10;

问题2：数组中出现重复项如何处理？

我一直认为二分查找可以很好地找到数组中的元素，但是我还没有想出如何将它与概率联系起来。

已编辑：根据建议，this question 与我的问题类似。然而，它的解决方案并不是我所期望的。我一直在寻找一种嵌入了 二分搜索 的解决方案，这可能会降低时间复杂度。

A good solution 关于给定一个键，如何使用二进制搜索找到排序数组中第一个大于键的元素。

Answer 1

对所有元素求和（表示和S），然后生成一个从1到S的随机数r。然后遍历所有数a_i。若a_i不小于r，则returna_i。否则从 r 中减去 a_i。继续，直到 returned 值。如果您只有一个查询，您将无法改进此解决方案。

编辑（归功于 JuanLopez）：但是，如果您要回答多个查询，则可以使用 prefix sum 中的预计算并将其与二进制搜索相结合以找到 sum^x[=15 的确切位置 k =]i=0a_i 将小于 k 且 x 最大。请注意，在进行前缀和预计算之后，您可以计算常量中的 sum^x_i=0a_i时间。

Answer 2

Given an array and a value k, write a function to return index of element that equals to k with the probability of k/sum(input array)

您可以将问题简化为 [1, sum] 的均匀采样。这个想法是使用初始列表的累积列表 cum_distr 并在 [1,sum] 中统一采样一个数字 r 并找到最高的 i 这样的 r<=cum_distr[i]

import random


def get_cum_distr(distr):
    cum_distr = []
    sum = 0
    for i in range(len(distr)):
        sum += distr[i]
        cum_distr.append(sum)
    return cum_distr


def sampler(cum_distr):
    r = random.randint(1, cum_distr[-1])
    i = 0
    while r > cum_distr[i]:
        i += 1
    return i


distr = [1, 4, 2, 3]
cum_distr = get_cum_distr(distr)
#test sampler
sample_size = 100000
samples = []
count = dict()
for _ in range(sample_size):
    r = sampler(cum_distr)
    if r in count:
        count[r] += 1
    else:
        count[r] = 1
#{0: 9996, 1: 40115, 2: 19934, 3: 29955}

请注意，如果索引的搜索成本很高，您可以改用二分搜索，因为 cum_distr 是非递减的。

How to deal with it if there are duplicates in the array?

没关系

Answer 3

您可以根据输入创建一个累积数组，其中 B[i] = A[0] + A[1] + ... + A[i]。在1和sum(A)之间生成一个随机intx，然后二分查找B第一个不小于x.

的元素

这是 Python 中的示例（使用 Python 的 bisect 模块，这本质上是二分查找）。

import random, bisect, collections

def make_random(A):
    s = sum(A)
    B = list(A)
    for i in xrange(1, len(B)):
        B[i] += B[i-1]
    def fn():
        r = random.randint(1, s)
        return bisect.bisect_left(B, r)
    return fn

rnd = make_random([1,4,2,3])

c = collections.Counter()
for i in xrange(10000):
    c[rnd()]+=1

print c

结果将如下所示：

Counter({1: 3960, 3: 3036, 2: 1992, 0: 1012})

Answer 4

这看起来像原始采样器（实际上是） , 但在检查元素的 order 中有一个微妙之处。通过将最大的权重放在前面，循环通常只需几次迭代即可完成。因此，如果分布非常偏斜，此方法可能更快平均。

[我用这个技巧从 Wakkerbot 的马尔可夫节点中使用的随机向量中采样]

#include <stdio.h>
#include <stdlib.h>

struct samp {
    int ret;
    unsigned weight;
    } array[4] = {{ 1,4}, { 3,3}, {2,2}, { 0,1} };

unsigned sumweight = 10;

     /* this is a *terrible* way to obtain a uniform random value */
#define urand(n) (random() % (n))

int sample(void)
{
unsigned idx, val;

val = urand(sumweight);

for( idx=0; idx < 4; idx++ ) {
    if (val < array[idx].weight) return array[idx].ret;
    val -= array[idx].weight;
    }
return -1;
}

int main(void)
{
int ret;
unsigned loop;

for (loop = 0; loop < 20; loop++) {
    ret = sample();
    printf("%u: %d\n" , loop, ret);
    }
return 0;
}

如何 return 元素的索引，元素值除以数组总和的概率

How to return index of an element with probability of the element's value divided by sum of array

arrays

algorithm

probability

binary-search