如何创建一个迭代器来生成项目，其中没有项目的单个字符在 python 中出现超过 n 次？

Question

我创建了一个脚本，该脚本使用以下代码迭代 sCharacters 字符串中的所有字符组合：

sCharacters = "abcdefghijklmnopqrstuvwxyz0123456789"
iKeyLength = len(sCharacters)

for sCharacterCombination in itertools.product(sCharacters, repeat=iKeyLength):
    # Do Stuff Here

然而，我只对在 sCharacterCombination 中没有单个字符出现超过 n 次的组合感兴趣。例如;我想过滤掉像这样的字符串 aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaab 并且只得到像 7efad3ca411bf57f1df57c0b4e9282aa

我尝试只检查每个 sCharacterCombination，但这并不好，因为我仍然必须遍历一堆我永远不会使用的项目。

如何让迭代器基于每个项目创建列表，每个项目没有单个字符首先表示超过 n 次，这样我就不必迭代我的项目不会用？

如果我能说的话就太棒了：

单个字符可以在 sCharacterCombination 中表示的最大次数。
单个字符可以在一行中表示的最大次数。

也就是说，单个字符最多可以在 sCharacterCombination 中出现四次，但连续出现的次数不能超过两次。例如。这没问题 1121... 但这不是 1112...。

感谢您的宝贵时间。

Answer 1

这里有一些代码比您当前的方法更有效。

首先，我们使用 itertool 的 combinations_with_replacement 函数来创建所需长度的组合，过滤掉重复次数超过所需次数的组合。然后我们排列每个组合；我使用的排列算法（由 14 世纪的印度数学家 Narayana Pandita 创建）可以正确处理重复，这与 itertools 中的不同。然后我们使用 itertool 的 groupby 来过滤掉包含运行s 相同字符且长度大于允许的运行长度的排列。

我已经包含了两个功能：permutations_with_limited_repetition 限制相同字符的运行的长度； permutations_with_repetition 没有。

请注意，输入序列必须从低到高排序，否则此算法将无法正常运行。

from itertools import combinations_with_replacement, groupby

def lexico_permute(a):
    a = list(a)
    yield a
    n = len(a) - 1
    while True:
        for j in range(n-1, -1, -1):
            if a[j] < a[j + 1]:
                break
        else:
            return

        v = a[j]
        for k in range(n, j, -1):
            if v < a[k]:
                break

        a[j], a[k] = a[k], a[j]
        a[j+1:] = a[j+1:][::-1]
        yield a

def permutations_with_repetition(seq, length, maxrepeat): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        yield from lexico_permute(combo)

def permutations_with_limited_repetition(seq, length, maxrepeat, maxrun): 
    for combo in combinations_with_replacement(seq, length):
        if any(combo.count(c) > maxrepeat for c in combo):
            continue
        for perm in lexico_permute(combo):
            if any(len(list(g)) > maxrun for _, g in groupby(perm)):
                continue
            yield perm

# Test

src = list('abcde')
for lst in permutations_with_limited_repetition(src, 3, 2, 1):
    print(''.join(lst))

输出

aba
aca
ada
aea
bab
abc
acb
bac
bca
cab
cba
abd
adb
bad
bda
dab
dba
abe
aeb
bae
bea
eab
eba
cac
acd
adc
cad
cda
dac
dca
ace
aec
cae
cea
eac
eca
dad
ade
aed
dae
dea
ead
eda
eae
bcb
bdb
beb
cbc
bcd
bdc
cbd
cdb
dbc
dcb
bce
bec
cbe
ceb
ebc
ecb
dbd
bde
bed
dbe
deb
ebd
edb
ebe
cdc
cec
dcd
cde
ced
dce
dec
ecd
edc
ece
ded
ede

有关置换算法的注释（非生成器）版本，请参阅我去年写的。

更新

第一个过滤器

if any(combo.count(c) > maxrepeat for c in combo):

可以通过使用与第二个过滤器相同的 groupby 技术来提高效率：

if any(len(list(g)) > maxrepeat for _, g in groupby(combo)):

（我应该昨天就想到了，但我本来不打算做第二个过滤器，这是最后一刻的灵感）。

如何创建一个迭代器来生成项目，其中没有项目的单个字符在 python 中出现超过 n 次？

How to create an iterator that produces items where no item has a single character represented more than n number times in python?

python

itertools

python-3.x

更新