在字符串中查找所有可能的替换组合

Question

这是我目前的代码：

from itertools import combinations, product

string = "abcd012345"

char = "01268abc"

for i, j in combinations(tuple(range(len(string))), 2):
    for char1, char2 in product(char, char):
        print(string[:i] + char1 + string[i+1:j] + char2 + string[j+1:])

因此，字符串是 abcd012345，我更改了两个字符以找到所有可能的组合。字符为 01268abc。在这个例子中，我们得到了 2880 种组合。

目标是设置哪些字符将出现在字符串的指定位置。示例如下：

from itertools import combinations, product

string = "abcd012345"
# place   0123456789      

char_to change_for_place0 = "ab02"
char_to change_for_place1 = "14ah"
char_to change_for_place2 = "94nf"
char_to change_for_place3 = "a"
char_to change_for_place4 = "9347592"
char_to change_for_place5 = "93478nvg"
char_to change_for_place6 = "b"
char_to change_for_place7 = ""
char_to change_for_place8 = ""
char_to change_for_place9 = "84n"

for i, j in combinations(tuple(range(len(string))), 2):
    for char1, char2 in product(char, char):
        print(string[:i] + char1 + string[i+1:j] + char2 + string[j+1:])

注:

有些地方可以空着，与第 7 和 8 处相同。
名额将是 64 个。
要更改的字符数将为 4 个，而不是示例中的 2 个。

我很乐意从您的解决方案和想法中学习，谢谢。

Answer 1

这归结为将 string 中每个位置的当前字母添加到该位置的当前替换项，然后创建这些选项的所有可能组合：

from itertools import combinations, product

string = "abcd012345"

# must be of same lenght as (string), each entry correspond to the same index in string
p = ["ab02", "14ah", "94nf", "a", "9347592", "93478nvg", "b", "", "", "84n"]  

errmsg = f"Keep them equal lenghts: '{string}' ({len(string)}) vs {p} ({len(p)})"
assert len(p)==len(string), errmsg

# eliminates duplicates from letter in string + replacments due to frozenset()
d = {idx: frozenset(v + string[idx]) for idx, v in enumerate(p)} 

# creating this list take memory
all_of_em = [''.join(whatever) for whatever in product(*d.values())]

# if you hit a MemoryError creating the list, write to a file instead
# this uses a generator with limits memory usage but the file is going
# to get BIG 
# with open("words.txt","w") as f:
#    for w in (''.join(whatever) for whatever in product(*d.values())):
#        f.write(w+"\n")

print(*all_of_em, f"\n{len(all_of_em)}", sep="\t")

输出：

2and2g234n      2and2g2348      2and2g2344      2and2g2345      2and27b34n      
[...snipp...]
249d99234n      249d992348      249d992344      249d992345      
100800

如果您重视替换中的字母顺序，请使用

d = {idx: (v if string[idx] in v else string[idx]+v) for idx, v in enumerate(p)}

改为：

abcd012345      abcd012348      [...]     2hfa2gb344      2hfa2gb34n      115200

数量差异是由于“9347592”中的重复 9 已使用 frozensets 删除。

仅获取更改少于 5 项的内容：

# use a generator comprehension to reduce memory usage
all_of_em = (''.join(whatever) for whatever in product(*d.values()))

# create the list with less then 5 changes from the generator above
fewer = [w for w in all_of_em if sum(a != b for a, b in zip(w, string)) < 5]

在字符串中查找所有可能的替换组合

Find all possible combinations with replacement in a string

python

combinations

itertools