来自 2 个列表的所有排列，但以元素数量为条件

Question

在 Python 我有 :

a = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

我正在尝试构建从列表 a 或 b 中选择的 10 个字符的所有可能排列，其中包括 any 排列包含: 至少 a 的 3 个元素和至少 b 的 3 个元素。这些元素可以重复，因为它们是从列表中选择的。

例如，有效排列：wa85ff3d56、333aaaaaaa、a33333aa33

执行此操作的最 pythonic、最有效的方法是什么？

PS：是的，我知道，答案会非常大。这是预期的。它不会存储在内存中，它会被流式传输并用于实时计算。

现在，我生成了所有可能的组合，然后对它们进行排列。但是，计算肯定需要几个月的时间。我有这段代码，它可以解决问题，但它显然不是最有效的方法。例如，列表 a 中的 3 和列表 b 中的 7，它会生成 2.864429568e+14 个排列。所以考虑到我必须这样做 5 次 (3,7), (4,6), (5,5), (6,4), (7,3)，我总共会得到 1.432214784e+15 个排列。相比之下，如果没有限制，就会有36^10 = 3.65615844e+15。所以这个方法将删除几乎一半的可能排列。

import string
import itertools


a = list(string.digits)
b = list(string.ascii_lowercase)

a_combinaisons_3 = list(itertools.combinations(a, 3))
b_combinaisons_7 = list(itertools.combinations(b, 7))

a3b7_combinaisons = []

for a_element in a_combinaisons_3:
    for b_element in b_combinaisons_7:
        a3b7_combinaisons.append(a_element + b_element)

a3b7_permutations = list(itertools.permutations(a3b7_combinaisons))
print(len(a3b7_combinaisons))  # 78936000
print(len(a3b7_permutations))  # 1.432214784e+15

Answer 1

看起来像是排列的排列？如果您需要允许列表中的每个元素在排列中重复，最简单的方法可能是扩展列表。然而，这看起来会变得很大。

from itertools import product, permutations

def permutem(l1, l2, min_num=3, length=10):
    for n in range(min_num, length - min_num + 1):
        a, b = n, length - n
        for result in permutations(product(l1, r=a), product(l2, r=b)):
            yield result

Answer 2

可能有一种运行时更高效的方法来生成您的值，但这里有一个 space 具有相当简单逻辑的高效实现：

def valid_patterns(l1, l2, length=10, min_from_l1=3, min_from_l2=3):
    # use sets instead of lists as we're going to be calling `in` a lot below
    s1, s2 = set(l1), set(l2)

    # generate all possible patterns (combinations with replacement)
    for pattern in itertools.product(l1+l2, repeat=length):
        # check to see if the pattern is valid
        if sum(ch in s1 for ch in pattern) >= min_from_l1 and \
           sum(ch in s2 for ch in pattern) >= min_from_l2:
            yield pattern

Answer 3

我想在这里尝试写一个通用的答案，希望以后有一个很好的规范目标来解决重复的问题。

Python 中的组合基础与 `itertools`

重新排序和替换（重复）

了解 itertools 提供的各种 combinatoric iterators 如何工作以及它们之间的区别很重要。

让我们考虑一个简单的候选集 A = [1, 2, 3, 4]，我们想从中得出三个元素的“组合”（如提问者通常所说的那样）。

>>> A = [1,2,3,4]

itertools.combinations selects 没有重新排序（即，每个输出值将以与输入中相同的顺序出现）并且结果值没有重复。因此，这只会产生 4 个结果：

>>> list(itertools.combinations(A, 3))
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]

itertools.permutations 表示结果可以以任何顺序出现，即输入的给定子集将以不同的顺序在输出中多次出现。

>>> list(itertools.permutations(A, 3)) # 24 results omitted for brevity

四个 combinations 分别出现在六个订单中，总共有 24 个结果。

itertools.combinations_with_replacement selects 无需重新排序，但允许多次选择输入中的元素：

>>> list(itertools.combinations_with_replacement(A, 3)) # 20 results omitted for brevity

每个元素被选择三次的结果有四种，其中 6 个是 double 后跟一个 single（必须更高），6 个是 single 后跟一个 double，加上四个 combinations 所有单身人士。在一般情况下统计结果并不容易。

如果你想在输出结果中允许重复和重新排序，你可以使用 itertools.product:

>>> list(itertools.product(A, repeat=3)) # 64 results omitted for brevity

简单的说，每次三次我们从四个元素中自由选择，得到4 * 4 * 4 = 64个结果。

itertools.product 实现了所谓的 Cartesian product。一般情况下，它会从多个指定序列中各取一个元素。但是生成“带替换的排列”与多次从同一序列中提取一个元素是一回事，因此 repeat 关键字作为此操作的便捷快捷方式提供 - 因此您不必指定序列多次。我的意思是，你可以写 itertools.product(*[A]*3)，但这很丑陋。

候选集中重复怎么办？

这与 OP 提出的问题无关；但为了完整起见，重要的是要了解 none 这些函数关心候选集的元素是否相等，甚至相同：

>>> x = object()
>>> candidates = [x, x, x, x]
>>> results = list(itertools.combinations(candidates, 3))
>>> len(results)
4
>>> results[0] == results[1] == results[2] == results[3]
True

我们如何实施约束？

最简单的方法是生成一个包容性结果（在 OP 的情况下，通过将 a 和 b 候选者连接在一起，并生成其中 10 个的 product）和过滤掉不符合要求的东西。然而，这是低效的并且可能很丑陋（我们需要分析输出元组以确定其元素是否满足条件 - 在 OP 的情况下，它们是从 a 还是从 b 中提取的。如果你确实想采用这样的方法，我通常建议编写一个生成器函数：

def OP_problem():
    for result in itertools.product(a+b, repeat=10):
        a_count = len(x for x in result if x in a)
        # the trick is that every element was either from a or b;
        # so "at least 3 a's and at least 3 b's" is equivalent to
        # "at least 3 a's and at most 7 a's".
        if 3 <= a_count <= 7:
            yield result

或者在足够简单的情况下，生成器表达式：

(
    # maybe you don't think this is simple enough :)
    result for result in itertools.product(a+b, repeat=10)
    if 3 <= len(x for x in result if x in a) <= 7
)

通常最好尝试将问题分解成更小的部分并将结果放在一起。例如，该文档有一个计算 power sets 的方法，它简单地将 0 个元素、1 个元素等组合的结果链接到所有元素。（另一种方法是找到 N 个布尔值的笛卡尔积，每个布尔值代表是否包含给定的元素，然后将它们转化为子集。）

在我们的例子中，我们可以分别找到每个 a 元素计数的结果。让我们考虑每个列表中有 5 个元素的情况；应该清楚如何归纳和组合结果（提示：使用 itertools.chain.from_iterable，如文档中的 powerset 配方所示）。

难点案例：分区和位置selection

我想在这里展示一种高级技术，以解决 selecting 来自 a 的 5 个元素和来自 b 的 5 个元素并将它们混合在一起的问题。这个想法很简单，我们 select a 将去的位置 ，在所有可能的位置中（即，索引到 10 个元素的序列中），并且对于每个，生成相应的输出结果。这些位置是没有替换（你明白为什么吗？）可能索引值的组合。

因此，类似于：

def make_combination(letter_positions, chosen_letters, chosen_digits):
    result = [None] * 10
    for letter, position in zip(chosen_letters, letter_positions):
        result[position] = letter
    # Figure out where the digits go, using set arithmetic to find the
    # remaining positions, then putting them back in ascending order.
    digit_positions = sorted(set(range(10)) - set(chosen_letters))
    for digit, position in zip(chosen_digits, digit_positions):
        result[position] = digit
    assert None not in result
    return tuple(result)


def five_letters_and_five_digits():
    letters = 'abcdefghijklmnopqrstuvwxyz'
    digits = '0123456789'
    # It's not *easy*, but it's fairly elegant.
    # We separately generate the letter positions, letter selection
    # and digit selection, using `product` to get the cartesian product
    # of *those* possibilities; then for each of those, we translate
    # into a desired output - using `starmap` to iterate.
    return itertools.starmap(
        make_combination, 
        itertools.product(
            itertools.combinations(range(10), 5),
            itertools.product(letters, repeat=5),
            itertools.product(digits, repeat=5)
        )
    )

选择位置的技巧对于解决 partitioning 问题也很有用。这个想法很简单，你选择 分区所在的位置 （对于 N 个元素，通常会有 N-1 个位置来放置它们），作为组合 - 或者与（如果大小为零的分区被允许）或没有（否则）替换。

Answer 4

案例没有重复：

目标 1: 找出字符属于不同“集合”的单词。每个单词由 10 个字符组成：w = c_1 ... c_10 这样每个单词都包含来自两个“集合”的 3 个字符。

目标 2：找到这些词的所有可能排列

数据

import string

alphabet = string.ascii_lowercase # "set" 1
digits = string.digits            # "set" 2

设一个词为 w = d_1...d_i a_1...a_j 使得 i+j=10 且 i,j >= 3 其中 d（数字）是“集合”的字符，a（字母）是另一个的字符。

步骤1：(i, j)-pairs的可能取值：一个词的划分只取决于每个集合中元素的数量，而不是元素本身。等价于将 10 个元素中的所有分区找到两个大小都大于或等于三的集合：

N = 10 # size of a "word"
# constraints on the amounts of type of characters in a word
lower_bound1, lower_bound2 = 3, 3

def word_partitioner(list1, min1, list2, min2, word_length=10):
    # return pairs which entries represent the amount of characters to be used to form a word of length word_length
    # min1, min2 represents the lower bounds of the type of characters to be contained in the word
    return [(i, j) for i, j in product(range(min1, len(list1)), range(min2, len(list2))) if i + j == word_length]

partitions = word_partitioner(list1, lower_bound1, list2, lower_bound2, N)
print(partitions)

输出：

[(3, 7), (4, 6), (5, 5), (6, 4), (7, 3)]

第2步：每组的每个单词都是唯一的，直到组合数量：有n个选择k个长度为k的单词out of n [NB 也叫二项式系数，nCk]:

在分区 (i, j) = (3, 7) 上可以分配 10C3 x 26C7 个长度为 10 的单词，这些单词由两个“集合”的字符组成。

第3步：对得到的单词中的每个字符进行排列，使每个单词唯一直到单词大小N

的阶乘

(i, j) = (3, 7) --> 10C3 x 26C7 --> 10C3 x 26C7 * N!

     (3, 7)--> 78936000 --> 286442956800000

第四步：遍历每一种可能的分词方式

import math
def words_stats(list1, n1, list2, n2):
    # return total amount of words per partition + print to screen info
    bin_prod = math.comb(len(list1), n1) * math.comb(len(list2), n2)
    fact = math.factorial(n1 + n2)
    tot = bin_prod * fact
    print(f'{len(list1)}C{n1} x {len(list2)}C{n2}  x ({n1+n2})! = {bin_prod} x {fact} = {tot}')
    return tot

print('Info:')
amount_of_possibilities = 0
for k1, k2 in partitions:
    amount_of_possibilities += words_stats(list1, k1, list2, k2)
print(f'Total possible words: {amount_of_possibilities}')

输出

Preliminar info:
10C3 x 26C7  x (10)! = 78936000 x 3628800 = 286442956800000
10C4 x 26C6  x (10)! = 48348300 x 3628800 = 175446311040000
10C5 x 26C5  x (10)! = 16576560 x 3628800 = 60153020928000
10C6 x 26C4  x (10)! = 3139500 x 3628800 = 11392617600000
10C7 x 26C3  x (10)! = 312000 x 3628800 = 1132185600000
Total possible words: 534567091968000

制作所有可能单词的字符串 (!!)（推荐用于小示例）

from itertools import permutations, combinations, product, 

def combinatorial_problem_formatter(list1, n1, list2, n2):
    # return table-like string of the possible words per partition
    print(f'Rows x Cols:  {len(list1)}C{n1} x {len(list2)}C{n2} x ({n1+n2})!')
    s = ''
    for pair1, pair2 in product(combinations(list1, n1), combinations(list2, n2)):
        for p in permutations(pair1 + pair2):
            s += ''.join(p) + ' '
        s += '\n'
    return s

玩具模型:

set1: [0, 1, 3, 4] 和 set2: [a, b, c, d]

查找长度为 N=3 且至少包含 1 个数字和 1 个字母的单词 w = c_1 c_2 c_3

N = 3 # size of a "word"

lower_bound1, lower_bound2 = 1, 1   # constraints on the amounts of type of characters in a word
list1, list2 = digits[:4], alphabet[:4]     # restrictions (for pedagogical reasons)
#

partitions = word_partitioner(list1, lower_bound1, list2, lower_bound2, N)
print('Partitions', partitions)

print('Preliminar info:')
amount_of_possibilities = 0
for k1, k2 in partitions:
    amount_of_possibilities += words_stats(list1, k1, list2, k2)
print(f'Total possible words: {amount_of_possibilities}')

print('='*40, '\n')

for k1, k2 in partitions:
    print(combinatorial_problem_formatter(list1, k1, list2, k2))
    print()

输出玩具模型

Partitions [(1, 2), (2, 1)]
Preliminar info:
4C1 x 4C2  x (3)! = 24 x 6 = 144
4C2 x 4C1  x (3)! = 24 x 6 = 144
Total possible words: 288
======================================== 

Rows x Cols:  4C1 x 4C2 x (3)!
0ab 0ba a0b ab0 b0a ba0 
0ac 0ca a0c ac0 c0a ca0 
0ad 0da a0d ad0 d0a da0 
0bc 0cb b0c bc0 c0b cb0 
0bd 0db b0d bd0 d0b db0 
0cd 0dc c0d cd0 d0c dc0 
1ab 1ba a1b ab1 b1a ba1 
1ac 1ca a1c ac1 c1a ca1 
1ad 1da a1d ad1 d1a da1 
1bc 1cb b1c bc1 c1b cb1 
1bd 1db b1d bd1 d1b db1 
1cd 1dc c1d cd1 d1c dc1 
2ab 2ba a2b ab2 b2a ba2 
2ac 2ca a2c ac2 c2a ca2 
2ad 2da a2d ad2 d2a da2 
2bc 2cb b2c bc2 c2b cb2 
2bd 2db b2d bd2 d2b db2 
2cd 2dc c2d cd2 d2c dc2 
3ab 3ba a3b ab3 b3a ba3 
3ac 3ca a3c ac3 c3a ca3 
3ad 3da a3d ad3 d3a da3 
3bc 3cb b3c bc3 c3b cb3 
3bd 3db b3d bd3 d3b db3 
3cd 3dc c3d cd3 d3c dc3 


Rows x Cols:  4C2 x 4C1 x (3)!
01a 0a1 10a 1a0 a01 a10 
01b 0b1 10b 1b0 b01 b10 
01c 0c1 10c 1c0 c01 c10 
01d 0d1 10d 1d0 d01 d10 
02a 0a2 20a 2a0 a02 a20 
02b 0b2 20b 2b0 b02 b20 
02c 0c2 20c 2c0 c02 c20 
02d 0d2 20d 2d0 d02 d20 
03a 0a3 30a 3a0 a03 a30 
03b 0b3 30b 3b0 b03 b30 
03c 0c3 30c 3c0 c03 c30 
03d 0d3 30d 3d0 d03 d30 
12a 1a2 21a 2a1 a12 a21 
12b 1b2 21b 2b1 b12 b21 
12c 1c2 21c 2c1 c12 c21 
12d 1d2 21d 2d1 d12 d21 
13a 1a3 31a 3a1 a13 a31 
13b 1b3 31b 3b1 b13 b31 
13c 1c3 31c 3c1 c13 c31 
13d 1d3 31d 3d1 d13 d31 
23a 2a3 32a 3a2 a23 a32 
23b 2b3 32b 3b2 b23 b32 
23c 2c3 32c 3c2 c23 c32 
23d 2d3 32d 3d2 d23 d32

注意：没有实现重复的情况；不是获取完整单词列表的明确方式，就像字符串版本一样。要获得所有此类单词的 list/iterator，请修改 combinatorial_problem_formatter 函数以跟踪已经很好的答案。

Answer 5

# A Python program to print all
# permutations of given length
from itertools import permutations

# check_validity(tuple1) checks whether the combination contains at least 3 elements from list a and at least 3 elements from list b
def check_validity(tuple1):
    a_cnt = 0
    b_cnt = 0
    for t in tuple1:
        if t in a:
            a_cnt += 1
        if t in b:
            b_cnt += 1
    if a_cnt >= 3 and b_cnt>=3:
        return True
    else:
        return False

# Get all permutations of length 10
a = ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']
b = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

list1 = a + b     # merge two lists
perm = list(permutations(list1, 10))     # find all permutaions

# Print the obtained permutations
result_perm = []
for tuple1 in perm:          # list of permutations contain tuples
    if check_validity(tuple1) == True:    # check_validity(tuple1) checks whether the combination contains at least 3 elements from list a and at least 3 elements from list b
        result_perm.append("".join(list(map(str,tuple1))))     # join characters from tuple to generate a string

result_perm

来自 2 个列表的所有排列，但以元素数量为条件

all permutations from 2 lists but with a condition on the amount of elements

python

permutation

Python 中的组合基础与 itertools

重新排序和替换（重复）

候选集中重复怎么办？

我们如何实施约束？

难点案例：分区和位置selection

Python 中的组合基础与 `itertools`