Python 中基于单个随机整数的随机连续列表切片

Question

使用一个随机数和一个列表，你如何 return 该列表的随机片段？

例如，给定列表 [0,1,2] 有 7 种随机连续切片的可能性：

[ ]
[ 0 ]
[ 0, 1 ]
[ 0, 1, 2 ]
[ 1 ]
[ 1, 2]
[ 2 ]

必须有一种方法可以生成一个随机数并使用该值计算出两个起始索引，而不是获取随机起始索引和随机结束索引和 end/length。

我需要这样，以确保这 7 种可能性具有相等的概率。

Answer 1

首先创建所有可能的切片索引。

[0:0]、[1:1] 等是等价的，所以我们只包括其中一个。

最后你选择了一个随机索引对，并应用它。

import random

l = [0, 1, 2]

combination_couples = [(0, 0)]
length = len(l)

# Creates all index couples.
for j in range(1, length+1):
    for i in range(j):
        combination_couples.append((i, j))

print(combination_couples)

rand_tuple = random.sample(combination_couples, 1)[0]
final_slice = l[rand_tuple[0]:rand_tuple[1]]

print(final_slice)

为了确保我们都得到了它们：

for i in combination_couples:
    print(l[i[0]:i[1]])

或者，用一些数学...

对于长度为3的列表，有0到3个可能的索引号，即n=4。你有 2 个，即 k=2。第一个索引必须小于第二个，因此我们需要计算组合 as described here.

from math import factorial as f    

def total_combinations(n, k=2):
    result = 1

    for i in range(1, k+1):
        result *= n - k + i
    result /= f(k)
    # We add plus 1 since we included [0:0] as well.
    return result + 1

print(total_combinations(n=4))    # Prints 7 as expected.

Answer 2

给空列表赋予与其他列表同等的权重有点奇怪。如果列表中有 n 个元素，则空列表的权重为其他列表的 0 或 n+1 倍更为自然。但如果你想让它具有相同的重量，你可以这样做。

有n*(n+1)/2个非空的连续子列表。您可以通过终点（从 0 到 n-1）和起点（从 0 到终点）指定这些。

生成一个从0到n*(n+1)/2的随机整数x。

如果 x=0，return 空列表。否则，x 从 1 到 n(n+1)/2 不均匀分布。

计算 e = floor(sqrt(2*x)-1/2)。这取值 0、1、1、2、2、2、3、3、3、3 等。

计算 s = (x-1) - e*(e+1)/2。这取值 0, 0, 1, 0, 1, 2, 0, 1, 2, 3, ...

Return 从索引 s 开始到索引 e 结束的区间。

(s,e) 取值 (0,0),(0,1),(1,1),(0,2),(1,2),(2,2),。 ..

import random
import math

n=10

x = random.randint(0,n*(n+1)/2)

if (x==0):
    print(range(n)[0:0]) // empty set
    exit()

e = int(math.floor(math.sqrt(2*x)-0.5))
s = int(x-1 - (e*(e+1)/2))

print(range(n)[s:e+1]) // starting at s, ending at e, inclusive

Answer 3

只需固定一个顺序，您可以按照该顺序对所有可能的切片进行排序，然后想出一种方法将所有切片列表中的索引转回切片端点。例如，您使用的顺序可以描述为

空切片在所有其他切片之前
非空切片按其起点排序
具有相同起点的切片按其端点排序

所以索引 0 应该 return 空列表。索引 1 到 n 应该 return [0:1] 到 [0:n]。 n+1 到 n+(n-1)=2n-1 的索引将是 [1:2] 到 [1:n]； 2n 到 n+(n-1)+(n-2)=3n-3 将是 [2:3] 到 [2:n] 等等。您在这里看到一个模式：给定起点的最后一个索引的形式为 n+(n-1)+(n-2)+(n-3)+…+(n-k)，其中 k 是序列的起始索引。那是 k 的 arithmetic series, so that sum is (k+1)(2n-k)/2=(2n+(2n-1)k-k²)/2. If you set that term equal to a given index, and solve that，你会得到一些涉及平方根的公式。然后，您可以使用 ceiling 函数将其转换为与该起点的最后一个索引相对应的 k 的整数值。一旦知道 k，计算终点就相当容易了。

但是上面解中的二次方程让事情变得非常难看。所以你最好使用其他命令。现在我想不出一种方法来避免这样的二次项。道格拉斯在 doesn't avoid square roots, but at least his square root is a bit simpler due to the fact that he sorts by end point first. The order in your question and my answer is called lexicographical order, his would be called reverse lexicographical 中使用的顺序通常更容易处理，因为它不依赖于 n。但是由于大多数人首先考虑正常（正向）字典顺序，这个答案对许多人来说可能更直观，甚至可能是某些应用程序所需的方式。

这里有一些 Python 代码，它按顺序列出所有序列元素，并按照我上面描述的方式从索引 i 到端点 [k:m] 的转换：

from math import ceil, sqrt
n = 3
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
    b = 1 - 2*n
    c = 2*(i - n) - 1
    # solve k^2 + b*k + c = 0
    k = int(ceil((- b - sqrt(b*b - 4*c))/2.))
    m = k + i - k*(2*n-k+1)//2
    print("{:3} [{}:{}]".format(i, k, m))

c 中的 - 1 项并非来自我上面给出的数学公式。它更像是从 i 的每个值中减去 0.5。这确保即使 sqrt 的结果稍微太大，您也不会得到太大的 k。所以这个术语解释了数字的不精确性，应该使整个事情变得非常稳健。

术语 k*(2*n-k+1)//2 是属于起始点 k-1 的最后一个索引，因此 i 减去该术语就是所考虑的子序列的长度。

你可以进一步简化事情。您可以在循环外执行一些计算，如果您必须重复选择随机序列，这可能很重要。您可以将 b 除以 2，然后在许多其他地方去掉该因子。结果可能如下所示：

from math import ceil, sqrt
n = 3
b = n - 0.5
bbc = b*b + 2*n + 1
print("{:3} []".format(0))
for i in range(1, n*(n+1)//2 + 1):
    k = int(ceil(b - sqrt(bbc - 2*i)))
    m = k + i - k*(2*n-k+1)//2
    print("{:3} [{}:{}]".format(i, k, m))

Answer 4

there must be a way to generate a single random number and use that one value to figure out both starting index and end/length.

很难说哪种方法最好，但如果您只对将单个随机数绑定到连续切片感兴趣，则可以使用模数。

给定一个列表 l 和一个随机数 r 你可以得到这样的连续切片：

l[r % len(l) : some_sparkling_transformation(r) % len(l)]

其中 some_sparkling_transformation(r) 是必不可少的。这取决于您的需求，但由于我在您的问题中没有看到任何特殊要求，例如：

l[r % len(l) : (2 * r) % len(l)]

这里最重要的是切片的左右边缘都与r相关。这使得定义此类不遵循任何可观察模式的连续切片成为问题。上面的示例（使用 2 * r）生成的切片始终为空列表或遵循 [a : 2 * a].

的模式

让我们使用一些直觉。我们知道我们想要以连续切片的形式找到数字 r 的良好随机表示。结果是我们需要找到两个数字：a 和 b 分别是切片的左边缘和右边缘。假设 r 是一个很好的随机数（我们在某种程度上喜欢它）我们可以说 a = r % len(l) 是一个很好的方法。

现在让我们尝试查找 b。生成另一个好的随机数的最佳方法是使用支持 seeding（两者）的随机数生成器（random 或 numpy）。 random 模块示例：

import random
def contiguous_slice(l, r):
    random.seed(r)
    a = int(random.uniform(0, len(l)+1))
    b = int(random.uniform(0, len(l)+1))
    a, b = sorted([a, b])
    return l[a:b]

祝你好运，玩得开心！

Python 中基于单个随机整数的随机连续列表切片

Random contiguous slice of list in Python based on a single random integer

python

random

math

python-2.7