生成所有排列,包括带权重的缩写

Generate all permutations including abbreviations with weightages

我的字符串-

name_target = "ARUN GULABRAO INDULKAR"

我想用原始名称和缩写生成所有排列并为每个排列分配权重 -

[ARUNGULABRAOINDULKAR, 1]
[ARUNGINDULKAR, 0.9]
[ARUNGULABRAOI, 0.9]
[AGULABRAOINDULKAR, 0.9]
[ARUNGI, 0.8]
[AGINDULKAR, 0.8]
[AGULABRAOI, 0.8]
[ARUNINDULKARGULABRAO, 1]
[ARUNIGULABRAO, 0.9]
[ARUNINDULKARG, 0.9]
[AINDULKARGULABRAO, 0.9]
[ARUNIG, 0.8]
[AIGULABRAO, 0.8]
[AINDULKARG, 0.8]
[GULABRAOARUNINDULKAR, 1]
[GULABRAOAINDULKAR, 0.9]
[GULABRAOARUNI, 0.9]
[GARUNINDULKAR, 0.9]
[GULABRAOAI, 0.8]
[GAINDULKAR, 0.8]
[GARUNI, 0.8]
[GULABRAOINDULKARARUN, 1]
[GULABRAOIARUN, 0.9]
[GULABRAOINDULKARA, 0.9]
[GINDULKARARUN, 0.9]
[GULABRAOIA, 0.8]
[GIARUN, 0.8]
[GINDULKARA, 0.8]
[INDULKARARUNGULABRAO, 1]
[INDULKARAGULABRAO, 0.9]
[INDULKARARUNG, 0.9]
[IARUNGULABRAO, 0.9]
[INDULKARAG, 0.8]
[IAGULABRAO, 0.8]
[IARUNG, 0.8]
[INDULKARGULABRAOARUN, 1]
[INDULKARGARUN, 0.9]
[INDULKARGULABRAOA, 0.9]
[IGULABRAOARUN, 0.9]
[INDULKARGA, 0.8]
[IGARUN, 0.8]
[IGULABRAOA, 0.8]

不关心这个输出数据结构,它可以是任何东西。如果不使用缩写和全名,权重为 1

如果使用缩写,权重会减少 10%。例如,第二个输出行中的 ARUNGINDULKAR 得到 0.9 因为中间名被缩写了。 ARUNGI 得到 0.8 因为中间名和姓氏被缩写了。

我已经有效地使用 itertools.permutations(name_target) 生成了第一组排列。

我不知道如何组合缩写词。 name_target' '

拆分时可以是可变长度

请忽略预期输出中的重复项。

您可以将递归与生成器结合使用来构建名称缩写组合。 itertools.permutations 还用于创建原始输入名称的所有可能排序,并且这些全名组合中的每一个都传递给 get_combos,其中生成缩写组合。布尔标志(全名 True,缩写 False)与 get_combos 中生成的每个名称组件相关联,允许计算权重:

from itertools import permutations as prmt
def get_combos(d, l, c = []):
   if d:
      yield from get_combos(d[1:], l, c+[(d[0], True)])
      if sum(not b for _, b in c) + 1 < l:
         yield from get_combos(d[1:], l, c+[(d[0][0], False)])
   else:
      yield [''.join(a for a, _ in c), 1-sum(0.1 for _, b in c if not b)]

name_target = "ARUN GULABRAO INDULKAR"
n = name_target.split()
l = len(n)
result = [i for b in prmt(n, l) for i in get_combos(b, l)]

输出:

[['ARUNGULABRAOINDULKAR', 1], ['ARUNGULABRAOI', 0.9], ['ARUNGINDULKAR', 0.9], ['ARUNGI', 0.8], ['AGULABRAOINDULKAR', 0.9], ['AGULABRAOI', 0.8], ['AGINDULKAR', 0.8], ['ARUNINDULKARGULABRAO', 1], ['ARUNINDULKARG', 0.9], ['ARUNIGULABRAO', 0.9], ['ARUNIG', 0.8], ['AINDULKARGULABRAO', 0.9], ['AINDULKARG', 0.8], ['AIGULABRAO', 0.8], ['GULABRAOARUNINDULKAR', 1], ['GULABRAOARUNI', 0.9], ['GULABRAOAINDULKAR', 0.9], ['GULABRAOAI', 0.8], ['GARUNINDULKAR', 0.9], ['GARUNI', 0.8], ['GAINDULKAR', 0.8], ['GULABRAOINDULKARARUN', 1], ['GULABRAOINDULKARA', 0.9], ['GULABRAOIARUN', 0.9], ['GULABRAOIA', 0.8], ['GINDULKARARUN', 0.9], ['GINDULKARA', 0.8], ['GIARUN', 0.8], ['INDULKARARUNGULABRAO', 1], ['INDULKARARUNG', 0.9], ['INDULKARAGULABRAO', 0.9], ['INDULKARAG', 0.8], ['IARUNGULABRAO', 0.9], ['IARUNG', 0.8], ['IAGULABRAO', 0.8], ['INDULKARGULABRAOARUN', 1], ['INDULKARGULABRAOA', 0.9], ['INDULKARGARUN', 0.9], ['INDULKARGA', 0.8], ['IGULABRAOARUN', 0.9], ['IGULABRAOA', 0.8], ['IGARUN', 0.8]]