在 Python 中恰好有 k 个 HEADS 的 n 次抛掷中的 HEADS 和 TAILS 的排列列表
list of permutations of HEADS and TAILS in n tosses with exactly k HEADS in Python
我正在尝试编写一些 python 代码来列出给定 n 次抛掷和正好 k 次正面朝上的不同排列。例如。对于 n=6
次抛球和 k=3
次正面朝上,我想要的输出是 ['000111', '001011', '001101', '001110', '010011', '010101', '010110', '011001', '011010', '011100', '100011', '100101', '100110', '101001', '101010', '101100', '110001', '110010', '110100', '111000']
(注意:这不是关于计算概率的问题,没有问题。我的问题是为 n 次抛掷和正好 k 次正面生成所有不同的排列。
我知道最终字符串的 数 由二项式系数也就是组合又名 n_choose_k 给出(例如 scipy.special.comb
)。但是,我在生成这些不同选项中的每一个时都遇到了问题,因为抛出的次数超过适度(n> 10)。我的第一个天真的方法是生成一个字符串的所有可能排列,例如 "1111100000"
(使用 itertools.permutations
),然后我使用 set()
删除重复项。这适用于低 n (n<=10
)。然而,这是非常低效的。即使使用 n=10 & k=5
,排列数也为 3,628,800,而组合数仅为 252。在 n>10
之后,这种情况逐渐停止。我知道 itertools
也有 combinations
生成器(和 _with_replacements
)。但是,我不确定是否可以使用它以我需要的方式生成输出,因为我的问题并不是从长度为 n 的集合中准确地选择 k 个元素的子集,其中顺序无关紧要。
我的初始代码如下。
from itertools import permutations
import scipy.special
n = 10
k = n//2
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))
U,D = '1','0'
seed = U * k + D * (n-k)
permlist = [''.join(p) for p in permutations(seed)]
permset = sorted(list(set(permlist)))
print(permset)
print('len permutations:{}, len set:{}'.format(len(permlist), len(permset)))
注意:我对 n 次掷球和 正好 k 次正面很感兴趣,尽管我很想知道如何将解决方案至少扩展到 也有 k 个头。
更新:
我接受了karl-knechtel's 。但是对于那些好奇的人,这里是所有三种方法 运行 次(在配备 i7-9750H @2.6Ghz 的笔记本电脑上)
结果:
10 choose 5 = 252.0
Permutations: 252 items. Calculated in 1.0340404510498047s
Combinations: 252 items. Calculated in 0.00103759765625s
Binary ints: 252 items. Calculated in 0.002962350845336914s
12 choose 6 = 924.0
Permutations: Gave up after 30 seconds.
Combinations: 924 items. Calculated in 0.001998424530029297s
Binary ints: 924 items. Calculated in 0.023003339767456055s
14 choose 7 = 3432.0
Permutations: Gave up after 30 seconds.
Combinations: 3432 items. Calculated in 0.011020183563232422s
Binary ints: 3432 items. Calculated in 0.10396194458007812s
16 choose 8 = 12870.0
Permutations: Gave up after 30 seconds.
Combinations: 12870 items. Calculated in 0.049001455307006836s
Binary ints: 12870 items. Calculated in 0.42699623107910156s
20 choose 10 = 184756.0
Permutations: Gave up after 30 seconds.
Combinations: 184756 items. Calculated in 0.6319949626922607s
Binary ints: 184756 items. Calculated in 7.8870697021484375s
22 choose 11 = 705432.0
Permutations: Gave up after 30 seconds.
Combinations: 705432 items. Calculated in 2.974030017852783s
Binary ints: Gave up after 30 seconds.
24 choose 12 = 2704156.0
Permutations: Gave up after 30 seconds.
Combinations: 2704156 items. Calculated in 11.795861721038818s
Binary ints: Gave up after 30 seconds.
26 choose 13 = 10400600.0
Permutations: Gave up after 30 seconds.
Combinations: 10400600 items. Calculated in 51.053600549697876s
Binary ints: Gave up after 30 seconds.
代码:
import numpy as np
from itertools import permutations, combinations
import scipy.special
import time
n = 26
k = n//2
test_permutations = n <= 10
test_combinations = True
test_binaryints = n <= 20
test_print_outputs = n <= 6
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))
U,D = '1','0'
def calc_permutations():
seed = U * k + D * (n-k)
l = [''.join(p) for p in permutations(seed)]
return list(set(l))
def calc_combinations():
'''Suggestion from https://whosebug.com/users/523612/karl-knechtel
'''
def bit_strings(size, one_count):
for one_indices in combinations(range(size), one_count):
yield ''.join(U if i in one_indices else D for i in range(size))
return list(bit_strings(n, k))
def calc_binaryints():
'''
Suggestion from https://whosebug.com/users/9046247/aramakus
'''
def check_bits(num, k, n):
""" True if `k` ones inside the number.
From https://whosebug.com/users/7505395/patrick-artner
"""
return sum( (num & 2**rank)>>rank for rank in range(n)) == k
def to_bin(i):
return format(i, '0'+str(n) + 'b')
start_int = 2**k-1
end_int = 2**n
l = [to_bin(i) for i in range(start_int, end_int) if check_bits(i,k,n)]
return l
def test_method(name, fn, do_test, sort=True):
print(name + ':', end='')
if do_test:
start_time = time.time()
l = fn()
duration = time.time() - start_time
if sort: l.sort()
print(' {} items. Calculated in {}s'.format(len(l), duration))
if test_print_outputs: print(l)
else:
print(' Gave up after 30 seconds.')
test_method('Permutations', calc_permutations, test_permutations)
test_method('Combinations', calc_combinations, test_combinations)
test_method('Binary ints', calc_binaryints, test_binaryints)
这不是一个公平的解决方案,但您可以计算一个 2**k-1
并将 0 和 2**k-1
之间的所有整数以二进制格式写入。此列表将包含所有 0 和 1 的组合,共有 k
个数字。
显示整个排列列表(作为字符串)是一个很大的内存消耗。您可以使用@aramakus 的修改版本,通过使用整数和检查器使用位移位来仅生成具有 k
"ones":
def check(num, k, n):
""" True if `k` ones inside the number."""
# uses bit shifting and masking to get number of 1s
return sum( (num & 2**rank)>>rank for rank in range(n)) == k
def pretty_print(l, n):
"""Pretty printer for list n (integer elements) padded to n digits"""
for num in l:
print((f'{num:0{n}b}'))
# or without formatting mini language
# print(('000000000'+bin(num)[2:])[-n:])
n = 15 # tested for up to 18 (which takes ~25s on my laptop)
k = n//2
nums = [i for i in range(2**n) if check(i,k,n)]
pretty_print(nums,n)
# total numbers is just 2**n, the amount of them with k ones is countable
print(f'exactly {k} ones: {len(nums)} out of {2**n} numbers')
输出:
000000001111111
000000010111111
000000011011111
000000011101111
000000011110111
...
111111001000000
111111010000000
111111100000000
exactly 7 ones: 6435 out of 32768 numbers
使用 time.time()
的大致计算时间(因此不是 repeated/averaged 多次运行的时间,并且肯定涉及相当多的错误度量):
# You should do this using timeit.timeit() for serious measurements
n k time in seconds
5 2 0.0019998550415039062
6 3 0.004002809524536133
7 3 0.009006261825561523
8 4 0.020014286041259766
9 4 0.04403090476989746
10 5 0.09506702423095703
11 5 0.20834732055664062
12 6 0.4523327350616455
13 6 0.9736926555633545
14 7 2.0954811573028564
15 7 4.479296922683716
16 8 9.40683913230896
17 8 19.881306886672974
18 9 41.978920459747314
# somewhat "linear" calculation time
However, I'm not sure if it's possible to use it to generate output in the way that I require, since my problem isn't exactly selecting a subset of k elements from a set of length n, where order does not matter.
其实有点横向思维。您想要 select 的集合是 索引 的集合,其中 1 应该出现在给定的输出中。
因此:使用 itertools.combinations
确定 1 应该去的索引(我们从 n
可能的索引值中选择 k
值 - 0
到 n-1
, inclusive - 没有替换;这正是组合),然后为每个生成字符串。例如,作为生成器:
def bit_strings(size, one_count):
for one_indices in itertools.combinations(range(size), one_count):
yield ''.join('1' if i in one_indices else '0' for i in range(size))
>>> len(list(bit_strings(20, 10))) # takes a bit less than a second on my machine
184756
当然,这仍然比直接计算组合数要慢得多(字面上!)。
我正在尝试编写一些 python 代码来列出给定 n 次抛掷和正好 k 次正面朝上的不同排列。例如。对于 n=6
次抛球和 k=3
次正面朝上,我想要的输出是 ['000111', '001011', '001101', '001110', '010011', '010101', '010110', '011001', '011010', '011100', '100011', '100101', '100110', '101001', '101010', '101100', '110001', '110010', '110100', '111000']
(注意:这不是关于计算概率的问题,没有问题。我的问题是为 n 次抛掷和正好 k 次正面生成所有不同的排列。
我知道最终字符串的 数 由二项式系数也就是组合又名 n_choose_k 给出(例如 scipy.special.comb
)。但是,我在生成这些不同选项中的每一个时都遇到了问题,因为抛出的次数超过适度(n> 10)。我的第一个天真的方法是生成一个字符串的所有可能排列,例如 "1111100000"
(使用 itertools.permutations
),然后我使用 set()
删除重复项。这适用于低 n (n<=10
)。然而,这是非常低效的。即使使用 n=10 & k=5
,排列数也为 3,628,800,而组合数仅为 252。在 n>10
之后,这种情况逐渐停止。我知道 itertools
也有 combinations
生成器(和 _with_replacements
)。但是,我不确定是否可以使用它以我需要的方式生成输出,因为我的问题并不是从长度为 n 的集合中准确地选择 k 个元素的子集,其中顺序无关紧要。
我的初始代码如下。
from itertools import permutations
import scipy.special
n = 10
k = n//2
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))
U,D = '1','0'
seed = U * k + D * (n-k)
permlist = [''.join(p) for p in permutations(seed)]
permset = sorted(list(set(permlist)))
print(permset)
print('len permutations:{}, len set:{}'.format(len(permlist), len(permset)))
注意:我对 n 次掷球和 正好 k 次正面很感兴趣,尽管我很想知道如何将解决方案至少扩展到 也有 k 个头。
更新:
我接受了karl-knechtel's
结果:
10 choose 5 = 252.0 Permutations: 252 items. Calculated in 1.0340404510498047s Combinations: 252 items. Calculated in 0.00103759765625s Binary ints: 252 items. Calculated in 0.002962350845336914s 12 choose 6 = 924.0 Permutations: Gave up after 30 seconds. Combinations: 924 items. Calculated in 0.001998424530029297s Binary ints: 924 items. Calculated in 0.023003339767456055s 14 choose 7 = 3432.0 Permutations: Gave up after 30 seconds. Combinations: 3432 items. Calculated in 0.011020183563232422s Binary ints: 3432 items. Calculated in 0.10396194458007812s 16 choose 8 = 12870.0 Permutations: Gave up after 30 seconds. Combinations: 12870 items. Calculated in 0.049001455307006836s Binary ints: 12870 items. Calculated in 0.42699623107910156s 20 choose 10 = 184756.0 Permutations: Gave up after 30 seconds. Combinations: 184756 items. Calculated in 0.6319949626922607s Binary ints: 184756 items. Calculated in 7.8870697021484375s 22 choose 11 = 705432.0 Permutations: Gave up after 30 seconds. Combinations: 705432 items. Calculated in 2.974030017852783s Binary ints: Gave up after 30 seconds. 24 choose 12 = 2704156.0 Permutations: Gave up after 30 seconds. Combinations: 2704156 items. Calculated in 11.795861721038818s Binary ints: Gave up after 30 seconds. 26 choose 13 = 10400600.0 Permutations: Gave up after 30 seconds. Combinations: 10400600 items. Calculated in 51.053600549697876s Binary ints: Gave up after 30 seconds.
代码:
import numpy as np
from itertools import permutations, combinations
import scipy.special
import time
n = 26
k = n//2
test_permutations = n <= 10
test_combinations = True
test_binaryints = n <= 20
test_print_outputs = n <= 6
n_choose_k = scipy.special.comb(n, k)
print('{} choose {} = {}'.format(n, k, n_choose_k))
U,D = '1','0'
def calc_permutations():
seed = U * k + D * (n-k)
l = [''.join(p) for p in permutations(seed)]
return list(set(l))
def calc_combinations():
'''Suggestion from https://whosebug.com/users/523612/karl-knechtel
'''
def bit_strings(size, one_count):
for one_indices in combinations(range(size), one_count):
yield ''.join(U if i in one_indices else D for i in range(size))
return list(bit_strings(n, k))
def calc_binaryints():
'''
Suggestion from https://whosebug.com/users/9046247/aramakus
'''
def check_bits(num, k, n):
""" True if `k` ones inside the number.
From https://whosebug.com/users/7505395/patrick-artner
"""
return sum( (num & 2**rank)>>rank for rank in range(n)) == k
def to_bin(i):
return format(i, '0'+str(n) + 'b')
start_int = 2**k-1
end_int = 2**n
l = [to_bin(i) for i in range(start_int, end_int) if check_bits(i,k,n)]
return l
def test_method(name, fn, do_test, sort=True):
print(name + ':', end='')
if do_test:
start_time = time.time()
l = fn()
duration = time.time() - start_time
if sort: l.sort()
print(' {} items. Calculated in {}s'.format(len(l), duration))
if test_print_outputs: print(l)
else:
print(' Gave up after 30 seconds.')
test_method('Permutations', calc_permutations, test_permutations)
test_method('Combinations', calc_combinations, test_combinations)
test_method('Binary ints', calc_binaryints, test_binaryints)
这不是一个公平的解决方案,但您可以计算一个 2**k-1
并将 0 和 2**k-1
之间的所有整数以二进制格式写入。此列表将包含所有 0 和 1 的组合,共有 k
个数字。
显示整个排列列表(作为字符串)是一个很大的内存消耗。您可以使用@aramakus k
"ones":
def check(num, k, n):
""" True if `k` ones inside the number."""
# uses bit shifting and masking to get number of 1s
return sum( (num & 2**rank)>>rank for rank in range(n)) == k
def pretty_print(l, n):
"""Pretty printer for list n (integer elements) padded to n digits"""
for num in l:
print((f'{num:0{n}b}'))
# or without formatting mini language
# print(('000000000'+bin(num)[2:])[-n:])
n = 15 # tested for up to 18 (which takes ~25s on my laptop)
k = n//2
nums = [i for i in range(2**n) if check(i,k,n)]
pretty_print(nums,n)
# total numbers is just 2**n, the amount of them with k ones is countable
print(f'exactly {k} ones: {len(nums)} out of {2**n} numbers')
输出:
000000001111111
000000010111111
000000011011111
000000011101111
000000011110111
...
111111001000000
111111010000000
111111100000000
exactly 7 ones: 6435 out of 32768 numbers
使用 time.time()
的大致计算时间(因此不是 repeated/averaged 多次运行的时间,并且肯定涉及相当多的错误度量):
# You should do this using timeit.timeit() for serious measurements
n k time in seconds
5 2 0.0019998550415039062
6 3 0.004002809524536133
7 3 0.009006261825561523
8 4 0.020014286041259766
9 4 0.04403090476989746
10 5 0.09506702423095703
11 5 0.20834732055664062
12 6 0.4523327350616455
13 6 0.9736926555633545
14 7 2.0954811573028564
15 7 4.479296922683716
16 8 9.40683913230896
17 8 19.881306886672974
18 9 41.978920459747314
# somewhat "linear" calculation time
However, I'm not sure if it's possible to use it to generate output in the way that I require, since my problem isn't exactly selecting a subset of k elements from a set of length n, where order does not matter.
其实有点横向思维。您想要 select 的集合是 索引 的集合,其中 1 应该出现在给定的输出中。
因此:使用 itertools.combinations
确定 1 应该去的索引(我们从 n
可能的索引值中选择 k
值 - 0
到 n-1
, inclusive - 没有替换;这正是组合),然后为每个生成字符串。例如,作为生成器:
def bit_strings(size, one_count):
for one_indices in itertools.combinations(range(size), one_count):
yield ''.join('1' if i in one_indices else '0' for i in range(size))
>>> len(list(bit_strings(20, 10))) # takes a bit less than a second on my machine
184756
当然,这仍然比直接计算组合数要慢得多(字面上!)。