将列表中的字符串分组,这些字符串具有相同的替换,即 Python 中的 4 个字母
Grouping strings in a list which have the same substing which is 4 letters in Python
这是一个示例列表:
["aaaa", "cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd" , "aaaab" , "aaaa".....]
预期的输出应该是这样的:
[("fish","ffish") , ("bird","birdd"), ("aaaa","aaaab","aaaa") ....]
或所有可能的双重匹配:
[("fish","ffish"),("ffish","fish"),("bird","birdd"), ("birdd","bird"),("aaaa","aaaab"),("aaaa","aaaa"),("aaaab","aaaa") ....]
这适用于小列表(因为时间复杂度):
lst = ["aaaa", "cat", "ccaatt", "fish", "ffish", "dog", "doog", "bird",
"birdd", "aaaab", "aaaa", 'fourrr', 'four']
lenght_four = {}
more_than_four = []
for item in lst:
if len(item) > 4:
more_than_four.append(item)
elif len(item) == 4:
exist = lenght_four.get(item)
if exist is not None:
exist.append(item)
else:
lenght_four[item] = []
for item in more_than_four:
for k, v in lenght_four.items():
if k in item:
v.append(item)
res = [(k, *v) for k, v in lenght_four.items() if v]
print(res)
输出:
[('aaaa', 'aaaa', 'aaaab'), ('fish', 'ffish'), ('bird', 'birdd'), ('four', 'fourrr')]
通过遍历列表,我们一次性完成了这些:(感谢@VPfB)
1- 排除小于 4 的项目。
2- 在字典中添加 4 个长度的项目。
3- 添加在单独列表中具有 len(item) > 4
的其他人。
然后我们遍历具有 len(item) > 4
的项目以检查 4-lenght 列表中的项目是否是它们的子串。
最后我们得到 lenght_four
字典中的值不是空列表的项目。
您可以将 filter
和 理解 组合在一起以获得所需的结果。
>>> data = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
>>> list(filter(lambda i: len(i)>=2,[tuple(x
for x in data if item in x)
for item in filter(lambda i: len(i) == 4, data)]))
#output: [('fish', 'ffish'), ('bird', 'birdd')]
如果只有相似词:
>>> wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
... if len(k) == 4:
... d[''.join(dict.fromkeys(w).keys())] = w
...
>>> d
{'fish': 'ffish', 'bird': 'birdd'}
否则会有更多意外:
>>> wlist = wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
... k = ''.join(dict.fromkeys(w).keys())
... if k != w and len(k) == 4:
... if k in d:
... d[k].append(w)
... else:
... d[k] = [w]
...
>>> d
{'fish': ['ffish', 'ffffishhh'], 'bird': ['birdd']}
>>>
在两者中,您可以将结果从 dict 转换为元组列表
[(k, v) for k, v in d.items() if v]
例如:
>>> [(k, v) for k, v in d.items() if v]
[('fish', ['ffish', 'ffffishhh']), ('bird', ['birdd'])]
如果模式保持不变,您可以使用 zip
和列表理解。
print([(x, y) for x, y in zip(lst[0::2], lst[1::2]) if x in y and len(x)==4])
另一种方式
lst = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
print([(lst[x], lst[x+1]) for x in range(0, len(lst), 2) if len(lst[x]) == 4 and lst[x] in lst[x+1]])
输出:
[('fish', 'ffish'), ('bird', 'birdd')]
这是一个示例列表:
["aaaa", "cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd" , "aaaab" , "aaaa".....]
预期的输出应该是这样的:
[("fish","ffish") , ("bird","birdd"), ("aaaa","aaaab","aaaa") ....]
或所有可能的双重匹配:
[("fish","ffish"),("ffish","fish"),("bird","birdd"), ("birdd","bird"),("aaaa","aaaab"),("aaaa","aaaa"),("aaaab","aaaa") ....]
这适用于小列表(因为时间复杂度):
lst = ["aaaa", "cat", "ccaatt", "fish", "ffish", "dog", "doog", "bird",
"birdd", "aaaab", "aaaa", 'fourrr', 'four']
lenght_four = {}
more_than_four = []
for item in lst:
if len(item) > 4:
more_than_four.append(item)
elif len(item) == 4:
exist = lenght_four.get(item)
if exist is not None:
exist.append(item)
else:
lenght_four[item] = []
for item in more_than_four:
for k, v in lenght_four.items():
if k in item:
v.append(item)
res = [(k, *v) for k, v in lenght_four.items() if v]
print(res)
输出:
[('aaaa', 'aaaa', 'aaaab'), ('fish', 'ffish'), ('bird', 'birdd'), ('four', 'fourrr')]
通过遍历列表,我们一次性完成了这些:(感谢@VPfB)
1- 排除小于 4 的项目。
2- 在字典中添加 4 个长度的项目。
3- 添加在单独列表中具有 len(item) > 4
的其他人。
然后我们遍历具有 len(item) > 4
的项目以检查 4-lenght 列表中的项目是否是它们的子串。
最后我们得到 lenght_four
字典中的值不是空列表的项目。
您可以将 filter
和 理解 组合在一起以获得所需的结果。
>>> data = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
>>> list(filter(lambda i: len(i)>=2,[tuple(x
for x in data if item in x)
for item in filter(lambda i: len(i) == 4, data)]))
#output: [('fish', 'ffish'), ('bird', 'birdd')]
如果只有相似词:
>>> wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
... if len(k) == 4:
... d[''.join(dict.fromkeys(w).keys())] = w
...
>>> d
{'fish': 'ffish', 'bird': 'birdd'}
否则会有更多意外:
>>> wlist = wlist = ["cat" , "ccaatt", "ccccattt" , "fish" , "ffish", "ffffishhh" , "dddog", "dog", "doog" ,"bird" , "birdd"]
>>> d = {}
>>> for w in wlist:
... k = ''.join(dict.fromkeys(w).keys())
... if k != w and len(k) == 4:
... if k in d:
... d[k].append(w)
... else:
... d[k] = [w]
...
>>> d
{'fish': ['ffish', 'ffffishhh'], 'bird': ['birdd']}
>>>
在两者中,您可以将结果从 dict 转换为元组列表
[(k, v) for k, v in d.items() if v]
例如:
>>> [(k, v) for k, v in d.items() if v]
[('fish', ['ffish', 'ffffishhh']), ('bird', ['birdd'])]
如果模式保持不变,您可以使用 zip
和列表理解。
print([(x, y) for x, y in zip(lst[0::2], lst[1::2]) if x in y and len(x)==4])
另一种方式
lst = ["cat" , "ccaatt" , "fish" , "ffish" , "dog", "doog" ,"bird" , "birdd"]
print([(lst[x], lst[x+1]) for x in range(0, len(lst), 2) if len(lst[x]) == 4 and lst[x] in lst[x+1]])
输出:
[('fish', 'ffish'), ('bird', 'birdd')]