关于有序对上的随机样本的算法设计手册(Steven Skiena)第 250 页

Algorithm design manual (Steven Skiena) regarding random samples on Ordered pairs Page 250

给出以下解释

Problem: We need an efficient and unbiased way to generate random pairs of vertices to perform random vertex swaps. Propose an efficient algorithm to generate elements from the (n 2) unordered pairs on {1, . . . , n} uniformly at random.

Solution: Uniformly generating random structures is a surprisingly subtle problem. Consider the following procedure to generate random unordered pairs: i = random int(1,n-1); j = random int(i+1,n);

It is clear that this indeed generates unordered pairs, since i < j. Further, it is clear that all (n 2) unordered pairs can indeed be generated, assuming that random int generates integers uniformly between its two arguments.

But are they uniform? The answer is no. What is the probability that pair (1,2) is generated? There is a 1/(n−1) chance of getting the 1, and then a 1/(n−1) chance of getting the 2, which yields p(1,2) = 1/(n − 1)2. But what is the probability of getting (n − 1,n)? Again, there is a 1/n chance of getting the first number, but now there is only one possible choice for the second candidate! This pair will occur n times more often than the first! The problem is that fewer pairs start with big numbers than little numbers. We could solve this problem by calculating exactly how unordered pairs start with i (exactly (n − i)) and appropriately bias the probability. The second value could then be selected uniformly at random from i + 1 to n. But instead of working through the math, let’s exploit the fact that randomly generating the n2 ordered pairs uniformly is easy. Just pick two integers independently of each other. Ignoring the ordering (i.e. , permuting the ordered pair to unordered pair (x,y) so that x < y) gives us a 2/n^2 probability of generating each unordered pair of distinct elements. If we happen to generate a pair (x,x), we discard it and try again. We will get unordered pairs uniformly at random in constant expected time using the following algorithm:

  1. 在上面的段落中“问题是更少的对开始于 大数多于小数。”这不应该是更多的对而不是更少的对

  2. 在上面的段落中 "We could solve this problem by calculating exactly how unordered pairs start with i (exactly (n − i))" 不应该是我有多少无序对而不是多少无序对

编辑

  1. 在上面的段落中“忽略顺序(即 , 将有序对置换为无序对 (x,y) 使得 x < y) 给我们生成每个无序对的 2/n^2 概率 不同的元素。”概率 2/n^2 是如何导出的?

谢谢

in the above paragraph "The problem is that fewer pairs start with big numbers than little numbers." shouldn't this be more pairs instead of fewer pairs

不,是少了。:

n - 1 pairs start with 1 (1 2; 1 3; ...; 1 n)
n - 2 pairs start with 2 (2 3; 2 4; ...; 2 n)
n - 3 pairs start with 3
...

in the above paragraph "We could solve this problem by calculating exactly how unordered pairs start with i (exactly (n − i))" shouldn't this me how many unordered pairs rather than how unordered pairs

是的,这里少了一个 "many"。

in the above paragraph "Ignoring the ordering (i.e. , permuting the ordered pair to unordered pair (x,y) so that x < y) gives us a 2/n^2 probability of generating each unordered pair of distinct elements." how is the probability 2/n^2 derived ?

n*n 种可能性生成对,其中顺序很重要(1 22 1 是不同的对)。由于您随后继续忽略排序,因此 1 22 1 将相同,因此您有两个有利的情况。

这并不能说明您丢弃 x x 对的事实。那么就是2 / (n*(n - 1)),因为如果你选了一次x,你只有n - 1次选的可能。

假设您的 n 个项目的索引为 0..(n-1),并且 random(n) 给出一个 ≥ 0 且 < n 的随机数:

i = random(n)
j = random(n-1)
j = (i+j+1) % n

现在 i≠j 的每一对 (i,j) 都有恰好概率 1/(n(n-1))。显然,交换 (i,j) 与交换 (j,i) 的结果相同。

你也可以这样做:

i = random(n)
j = random(n)

并忽略这可能导致 (i,i) 对的事实(交换它们将无效)。