将列表中的字符串分组,这些字符串具有相同的替换,即 Python 中的 4 个字母

Grouping strings in a list which have the same substing which is 4 letters in Python

这是一个示例列表:

["aaaa", "cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd" , "aaaab" , "aaaa".....]

预期的输出应该是这样的:

[("fish","ffish") , ("bird","birdd"), ("aaaa","aaaab","aaaa") ....]

或所有可能的双重匹配:

[("fish","ffish"),("ffish","fish"),("bird","birdd"), ("birdd","bird"),("aaaa","aaaab"),("aaaa","aaaa"),("aaaab","aaaa") ....] 

这适用于小列表(因为时间复杂度):

lst = ["aaaa", "cat", "ccaatt", "fish", "ffish", "dog", "doog", "bird",
       "birdd", "aaaab", "aaaa", 'fourrr', 'four']

lenght_four = {}
more_than_four = []

for item in lst:

    if len(item) > 4:
        more_than_four.append(item)

    elif len(item) == 4:
        exist = lenght_four.get(item)
        if exist is not None:
            exist.append(item)
        else:
            lenght_four[item] = []

for item in more_than_four:
    for k, v in lenght_four.items():
        if k in item:
            v.append(item)

res = [(k, *v) for k, v in lenght_four.items() if v]
print(res)

输出:

[('aaaa', 'aaaa', 'aaaab'), ('fish', 'ffish'), ('bird', 'birdd'), ('four', 'fourrr')]

通过遍历列表,我们一次性完成了这些:(感谢@VPfB)

1- 排除小于 4 的项目。

2- 在字典中添加 4 个长度的项目。

3- 添加在单独列表中具有 len(item) > 4 的其他人。

然后我们遍历具有 len(item) > 4 的项目以检查 4-lenght 列表中的项目是否是它们的子串。

最后我们得到 lenght_four 字典中的值不是空列表的项目。

您可以将 filter 理解 组合在一起以获得所需的结果。

>>> data = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
>>> list(filter(lambda i: len(i)>=2,[tuple(x
                                       for x in data if item in x)
                                 for item in filter(lambda i: len(i) == 4, data)]))

#output: [('fish', 'ffish'), ('bird', 'birdd')]

如果只有相似词:

>>>  wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
...  if len(k) == 4:
...   d[''.join(dict.fromkeys(w).keys())] = w
... 
>>> d
{'fish': 'ffish', 'bird': 'birdd'}

否则会有更多意外:

>>> wlist =  wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
...  k = ''.join(dict.fromkeys(w).keys())
...  if k != w and len(k) == 4:
...   if k in d:
...    d[k].append(w)
...   else:
...    d[k] = [w]
... 
>>> d
{'fish': ['ffish', 'ffffishhh'], 'bird': ['birdd']}
>>>

在两者中,您可以将结果从 dict 转换为元组列表

[(k, v) for k, v in d.items() if v]

例如:

>>> [(k, v) for k, v in d.items() if v]
[('fish', ['ffish', 'ffffishhh']), ('bird', ['birdd'])]

如果模式保持不变,您可以使用 zip 和列表理解。

print([(x, y) for x, y in zip(lst[0::2], lst[1::2]) if x in y and len(x)==4])

另一种方式

lst = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]

print([(lst[x], lst[x+1]) for x in range(0, len(lst), 2) if len(lst[x]) == 4 and lst[x] in lst[x+1]])

输出:

[('fish', 'ffish'), ('bird', 'birdd')]