在 Python 中恰好有 k 个 HEADS 的 n 次抛掷中的 HEADS 和 TAILS 的排列列表

Question

我正在尝试编写一些 python 代码来列出给定 n 次抛掷和正好 k 次正面朝上的不同排列。例如。对于 n=6 次抛球和 k=3 次正面朝上，我想要的输出是 ['000111', '001011', '001101', '001110', '010011', '010101', '010110', '011001', '011010', '011100', '100011', '100101', '100110', '101001', '101010', '101100', '110001', '110010', '110100', '111000'] （注意：这不是关于计算概率的问题，没有问题。我的问题是为 n 次抛掷和正好 k 次正面生成所有不同的排列。

我知道最终字符串的数由二项式系数也就是组合又名 n_choose_k 给出（例如 scipy.special.comb）。但是，我在生成这些不同选项中的每一个时都遇到了问题，因为抛出的次数超过适度（n> 10）。我的第一个天真的方法是生成一个字符串的所有可能排列，例如 "1111100000"（使用 itertools.permutations），然后我使用 set() 删除重复项。这适用于低 n (n<=10)。然而，这是非常低效的。即使使用 n=10 & k=5，排列数也为 3,628,800，而组合数仅为 252。在 n>10 之后，这种情况逐渐停止。我知道 itertools 也有 combinations 生成器（和 _with_replacements）。但是，我不确定是否可以使用它以我需要的方式生成输出，因为我的问题并不是从长度为 n 的集合中准确地选择 k 个元素的子集，其中顺序无关紧要。

我的初始代码如下。

from itertools import permutations
import scipy.special

n = 10
k = n//2
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))

U,D = '1','0'
seed = U * k + D * (n-k)
permlist = [''.join(p) for p in permutations(seed)]
permset = sorted(list(set(permlist)))
print(permset)
print('len permutations:{}, len set:{}'.format(len(permlist), len(permset)))

注意：我对 n 次掷球和正好 k 次正面很感兴趣，尽管我很想知道如何将解决方案至少扩展到也有 k 个头。

更新：

我接受了karl-knechtel's 。但是对于那些好奇的人，这里是所有三种方法运行次（在配备 i7-9750H @2.6Ghz 的笔记本电脑上）

结果：

10 choose 5 = 252.0
Permutations: 252 items. Calculated in 1.0340404510498047s
Combinations: 252 items. Calculated in 0.00103759765625s
Binary ints: 252 items. Calculated in 0.002962350845336914s

12 choose 6 = 924.0
Permutations: Gave up after 30 seconds.
Combinations: 924 items. Calculated in 0.001998424530029297s
Binary ints: 924 items. Calculated in 0.023003339767456055s

14 choose 7 = 3432.0
Permutations: Gave up after 30 seconds.
Combinations: 3432 items. Calculated in 0.011020183563232422s
Binary ints: 3432 items. Calculated in 0.10396194458007812s

16 choose 8 = 12870.0
Permutations: Gave up after 30 seconds.
Combinations: 12870 items. Calculated in 0.049001455307006836s
Binary ints: 12870 items. Calculated in 0.42699623107910156s

20 choose 10 = 184756.0
Permutations: Gave up after 30 seconds.
Combinations: 184756 items. Calculated in 0.6319949626922607s
Binary ints: 184756 items. Calculated in 7.8870697021484375s

22 choose 11 = 705432.0
Permutations: Gave up after 30 seconds.
Combinations: 705432 items. Calculated in 2.974030017852783s
Binary ints: Gave up after 30 seconds.

24 choose 12 = 2704156.0
Permutations: Gave up after 30 seconds.
Combinations: 2704156 items. Calculated in 11.795861721038818s
Binary ints: Gave up after 30 seconds.

26 choose 13 = 10400600.0
Permutations: Gave up after 30 seconds.
Combinations: 10400600 items. Calculated in 51.053600549697876s
Binary ints: Gave up after 30 seconds.

代码：

import numpy as np
from itertools import permutations, combinations
import scipy.special
import time

n = 26
k = n//2

test_permutations = n <= 10
test_combinations = True
test_binaryints = n <= 20

test_print_outputs = n <= 6

n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))

U,D = '1','0'

def calc_permutations():
    seed = U * k + D * (n-k)
    l = [''.join(p) for p in permutations(seed)]
    return list(set(l))


def calc_combinations():
    '''Suggestion from https://whosebug.com/users/523612/karl-knechtel
    
    '''
    def bit_strings(size, one_count):
        for one_indices in combinations(range(size), one_count):
            yield ''.join(U if i in one_indices else D for i in range(size))

    return list(bit_strings(n, k))


def calc_binaryints():
    '''
    Suggestion from https://whosebug.com/users/9046247/aramakus
    
    '''
    def check_bits(num, k, n):
        """ True if `k` ones inside the number.
        From https://whosebug.com/users/7505395/patrick-artner
        
        """
        return sum( (num & 2**rank)>>rank for rank in range(n)) == k
        
    def to_bin(i):
        return format(i, '0'+str(n) + 'b')

    start_int = 2**k-1
    end_int = 2**n
    l = [to_bin(i) for i in range(start_int, end_int) if check_bits(i,k,n)]
    return l


def test_method(name, fn, do_test, sort=True):
    print(name + ':', end='')
    if do_test:
        start_time = time.time()
        l = fn()
        duration = time.time() - start_time
        if sort: l.sort()
        print(' {} items. Calculated in {}s'.format(len(l), duration))
        if test_print_outputs: print(l)
    else:
        print(' Gave up after 30 seconds.')

test_method('Permutations', calc_permutations, test_permutations)
test_method('Combinations', calc_combinations, test_combinations)
test_method('Binary ints', calc_binaryints, test_binaryints)

Answer 1

这不是一个公平的解决方案，但您可以计算一个 2**k-1 并将 0 和 2**k-1 之间的所有整数以二进制格式写入。此列表将包含所有 0 和 1 的组合，共有 k 个数字。

Answer 2

显示整个排列列表（作为字符串）是一个很大的内存消耗。您可以使用@aramakus 的修改版本，通过使用整数和检查器使用位移位来仅生成具有 k "ones":

def check(num, k, n):
    """ True if `k` ones inside the number."""
    # uses bit shifting and masking to get number of 1s
    return sum( (num & 2**rank)>>rank for rank in range(n)) == k

def pretty_print(l, n):
    """Pretty printer for list n (integer elements) padded to n digits"""
    for num in l:
        print((f'{num:0{n}b}'))
        # or without formatting mini language
        # print(('000000000'+bin(num)[2:])[-n:])

n = 15     # tested for up to 18 (which takes ~25s on my laptop)
k = n//2  
nums = [i for i in range(2**n) if check(i,k,n)]

pretty_print(nums,n) 

# total numbers is just 2**n, the amount of them with k ones is countable
print(f'exactly {k} ones: {len(nums)} out of {2**n} numbers')

输出：

000000001111111
000000010111111
000000011011111
000000011101111
000000011110111
...
111111001000000
111111010000000
111111100000000
exactly 7 ones: 6435 out of 32768 numbers

使用 time.time() 的大致计算时间（因此不是 repeated/averaged 多次运行的时间，并且肯定涉及相当多的错误度量）：

 # You should do this using timeit.timeit() for serious measurements

 n   k   time in seconds
 5   2   0.0019998550415039062
 6   3   0.004002809524536133
 7   3   0.009006261825561523
 8   4   0.020014286041259766
 9   4   0.04403090476989746
10   5   0.09506702423095703
11   5   0.20834732055664062
12   6   0.4523327350616455
13   6   0.9736926555633545
14   7   2.0954811573028564
15   7   4.479296922683716
16   8   9.40683913230896
17   8   19.881306886672974
18   9   41.978920459747314

# somewhat "linear" calculation time

Answer 3

However, I'm not sure if it's possible to use it to generate output in the way that I require, since my problem isn't exactly selecting a subset of k elements from a set of length n, where order does not matter.

其实有点横向思维。您想要 select 的集合是索引的集合，其中 1 应该出现在给定的输出中。

因此：使用 itertools.combinations 确定 1 应该去的索引（我们从 n 可能的索引值中选择 k 值 - 0 到 n-1, inclusive - 没有替换；这正是组合），然后为每个生成字符串。例如，作为生成器：

def bit_strings(size, one_count):
    for one_indices in itertools.combinations(range(size), one_count):
        yield ''.join('1' if i in one_indices else '0' for i in range(size))

>>> len(list(bit_strings(20, 10))) # takes a bit less than a second on my machine
184756

当然，这仍然比直接计算组合数要慢得多（字面上！）。

在 Python 中恰好有 k 个 HEADS 的 n 次抛掷中的 HEADS 和 TAILS 的排列列表

list of permutations of HEADS and TAILS in n tosses with exactly k HEADS in Python

python

combinations

probability

permutation

combinatorics

更新：

结果：

代码：