在 Python 中恰好有 k 个 HEADS 的 n 次抛掷中的 HEADS 和 TAILS 的排列列表

list of permutations of HEADS and TAILS in n tosses with exactly k HEADS in Python

我正在尝试编写一些 python 代码来列出给定 n 次抛掷和正好 k 次正面朝上的不同排列。例如。对于 n=6 次抛球和 k=3 次正面朝上,我想要的输出是 ['000111', '001011', '001101', '001110', '010011', '010101', '010110', '011001', '011010', '011100', '100011', '100101', '100110', '101001', '101010', '101100', '110001', '110010', '110100', '111000'] (注意:这不是关于计算概率的问题,没有问题。我的问题是为 n 次抛掷和正好 k 次正面生成所有不同的排列。

我知道最终字符串的 由二项式系数也就是组合又名 n_choose_k 给出(例如 scipy.special.comb)。但是,我在生成这些不同选项中的每一个时都遇到了问题,因为抛出的次数超过适度(n> 10)。我的第一个天真的方法是生成一个字符串的所有可能排列,例如 "1111100000"(使用 itertools.permutations),然后我使用 set() 删除重复项。这适用于低 n (n<=10)。然而,这是非常低效的。即使使用 n=10 & k=5,排列数也为 3,628,800,而组合数仅为 252。在 n>10 之后,这种情况逐渐停止。我知道 itertools 也有 combinations 生成器(和 _with_replacements)。但是,我不确定是否可以使用它以我需要的方式生成输出,因为我的问题并不是从长度为 n 的集合中准确地选择 k 个元素的子集,其中顺序无关紧要。

我的初始代码如下。

from itertools import permutations
import scipy.special

n = 10
k = n//2
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))

U,D = '1','0'
seed = U * k + D * (n-k)
permlist = [''.join(p) for p in permutations(seed)]
permset = sorted(list(set(permlist)))
print(permset)
print('len permutations:{}, len set:{}'.format(len(permlist), len(permset)))

注意:我对 n 次掷球和 正好 k 次正面很感兴趣,尽管我很想知道如何将解决方案至少扩展到 也有 k 个头。


更新:

我接受了karl-knechtel's 。但是对于那些好奇的人,这里是所有三种方法 运行 次(在配备 i7-9750H @2.6Ghz 的笔记本电脑上)

结果:

10 choose 5 = 252.0
Permutations: 252 items. Calculated in 1.0340404510498047s
Combinations: 252 items. Calculated in 0.00103759765625s
Binary ints: 252 items. Calculated in 0.002962350845336914s

12 choose 6 = 924.0
Permutations: Gave up after 30 seconds.
Combinations: 924 items. Calculated in 0.001998424530029297s
Binary ints: 924 items. Calculated in 0.023003339767456055s

14 choose 7 = 3432.0
Permutations: Gave up after 30 seconds.
Combinations: 3432 items. Calculated in 0.011020183563232422s
Binary ints: 3432 items. Calculated in 0.10396194458007812s

16 choose 8 = 12870.0
Permutations: Gave up after 30 seconds.
Combinations: 12870 items. Calculated in 0.049001455307006836s
Binary ints: 12870 items. Calculated in 0.42699623107910156s

20 choose 10 = 184756.0
Permutations: Gave up after 30 seconds.
Combinations: 184756 items. Calculated in 0.6319949626922607s
Binary ints: 184756 items. Calculated in 7.8870697021484375s

22 choose 11 = 705432.0
Permutations: Gave up after 30 seconds.
Combinations: 705432 items. Calculated in 2.974030017852783s
Binary ints: Gave up after 30 seconds.

24 choose 12 = 2704156.0
Permutations: Gave up after 30 seconds.
Combinations: 2704156 items. Calculated in 11.795861721038818s
Binary ints: Gave up after 30 seconds.

26 choose 13 = 10400600.0
Permutations: Gave up after 30 seconds.
Combinations: 10400600 items. Calculated in 51.053600549697876s
Binary ints: Gave up after 30 seconds.

代码:

import numpy as np
from itertools import permutations, combinations
import scipy.special
import time

n = 26
k = n//2

test_permutations = n <= 10
test_combinations = True
test_binaryints = n <= 20

test_print_outputs = n <= 6

n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))

U,D = '1','0'

def calc_permutations():
    seed = U * k + D * (n-k)
    l = [''.join(p) for p in permutations(seed)]
    return list(set(l))


def calc_combinations():
    '''Suggestion from https://whosebug.com/users/523612/karl-knechtel
    
    '''
    def bit_strings(size, one_count):
        for one_indices in combinations(range(size), one_count):
            yield ''.join(U if i in one_indices else D for i in range(size))

    return list(bit_strings(n, k))


def calc_binaryints():
    '''
    Suggestion from https://whosebug.com/users/9046247/aramakus
    
    '''
    def check_bits(num, k, n):
        """ True if `k` ones inside the number.
        From https://whosebug.com/users/7505395/patrick-artner
        
        """
        return sum( (num & 2**rank)>>rank for rank in range(n)) == k
        
    def to_bin(i):
        return format(i, '0'+str(n) + 'b')

    start_int = 2**k-1
    end_int = 2**n
    l = [to_bin(i) for i in range(start_int, end_int) if check_bits(i,k,n)]
    return l


def test_method(name, fn, do_test, sort=True):
    print(name + ':', end='')
    if do_test:
        start_time = time.time()
        l = fn()
        duration = time.time() - start_time
        if sort: l.sort()
        print(' {} items. Calculated in {}s'.format(len(l), duration))
        if test_print_outputs: print(l)
    else:
        print(' Gave up after 30 seconds.')

test_method('Permutations', calc_permutations, test_permutations)
test_method('Combinations', calc_combinations, test_combinations)
test_method('Binary ints', calc_binaryints, test_binaryints)

这不是一个公平的解决方案,但您可以计算一个 2**k-1 并将 0 和 2**k-1 之间的所有整数以二进制格式写入。此列表将包含所有 0 和 1 的组合,共有 k 个数字。

显示整个排列列表(作为字符串)是一个很大的内存消耗。您可以使用@aramakus 的修改版本,通过使用整数和检查器使用位移位来仅生成具有 k "ones":

def check(num, k, n):
    """ True if `k` ones inside the number."""
    # uses bit shifting and masking to get number of 1s
    return sum( (num & 2**rank)>>rank for rank in range(n)) == k

def pretty_print(l, n):
    """Pretty printer for list n (integer elements) padded to n digits"""
    for num in l:
        print((f'{num:0{n}b}'))
        # or without formatting mini language
        # print(('000000000'+bin(num)[2:])[-n:])

n = 15     # tested for up to 18 (which takes ~25s on my laptop)
k = n//2  
nums = [i for i in range(2**n) if check(i,k,n)]

pretty_print(nums,n) 

# total numbers is just 2**n, the amount of them with k ones is countable
print(f'exactly {k} ones: {len(nums)} out of {2**n} numbers')

输出:

000000001111111
000000010111111
000000011011111
000000011101111
000000011110111
...
111111001000000
111111010000000
111111100000000
exactly 7 ones: 6435 out of 32768 numbers

使用 time.time() 的大致计算时间(因此不是 repeated/averaged 多次运行的时间,并且肯定涉及相当多的错误度量):

 # You should do this using timeit.timeit() for serious measurements

 n   k   time in seconds
 5   2   0.0019998550415039062
 6   3   0.004002809524536133
 7   3   0.009006261825561523
 8   4   0.020014286041259766
 9   4   0.04403090476989746
10   5   0.09506702423095703
11   5   0.20834732055664062
12   6   0.4523327350616455
13   6   0.9736926555633545
14   7   2.0954811573028564
15   7   4.479296922683716
16   8   9.40683913230896
17   8   19.881306886672974
18   9   41.978920459747314

# somewhat "linear" calculation time

However, I'm not sure if it's possible to use it to generate output in the way that I require, since my problem isn't exactly selecting a subset of k elements from a set of length n, where order does not matter.

其实有点横向思维。您想要 select 的集合是 索引 的集合,其中 1 应该出现在给定的输出中。

因此:使用 itertools.combinations 确定 1 应该去的索引(我们从 n 可能的索引值中选择 k 值 - 0n-1, inclusive - 没有替换;这正是组合),然后为每个生成字符串。例如,作为生成器:

def bit_strings(size, one_count):
    for one_indices in itertools.combinations(range(size), one_count):
        yield ''.join('1' if i in one_indices else '0' for i in range(size))

>>> len(list(bit_strings(20, 10))) # takes a bit less than a second on my machine
184756

当然,这仍然比直接计算组合数要慢得多(字面上!)。