找到一组是字谜的字符串
Find group of strings that are anagrams
本题参考this problem on lintcode。我有一个可行的解决方案,但对于庞大的测试用例来说,它花费的时间太长了。我想知道如何改进它?也许我可以减少在外循环中进行比较的次数。
class Solution:
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
@staticmethod
def isanagram(s, t):
if len(s)!=len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
if chars[i]<0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
更新: 只是为了补充,而不是寻找内置的 pythonic 解决方案,例如使用已经优化的 Counter
。已经添加了Mike的建议,还是超时了
为什么不是这个?
str1 = "cafe"
str2 = "face"
def isanagram(s1,s2):
return all(sorted(list(str1)) == sorted(list(str2)))
if isanagram(str1, str2):
print "Woo"
跳过您已经放入集合中的字符串。不要再测试它们了。
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
# If both anagrams exist in set, there is no need to compare them.
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
您还可以在遍历字母之前在变位词测试中进行长度比较。只要字符串的长度不同,它们就不可能是变位词。此外,当 chars
中的计数器在比较 t 中的值时达到 -1 时,只是 return 错误。不要再次遍历 chars
。
@staticmethod
def isanagram(s, t):
# Test strings are the same length
if len(s) != len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
# If this is below 0, return false
if chars[i] < 0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
作为对@Mike 出色回答的补充,这里有一个很好的 Pythonic 方法:
import collections
class Solution:
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
patterns = Solution.find_anagram_words(strs)
return [word for word in strs if ''.join(sorted(word)) in patterns]
@staticmethod
def find_anagram_words(strs):
anagrams = collections.Counter(''.join(sorted(word)) for word in strs)
return {word for word, times in anagrams.items() if times > 1}
您可以创建一个字典(或 collections.defaultdict
) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter
,而不是比较所有的字符串对。之后,您只需要从该字典中获取值。如果您想要所有单词都是任何字谜换句话说,只需合并具有多个条目的列表。
strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)
for s in strings:
anagrams[frozenset(Counter(s).items())].append(s)
print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']
当然,如果您不想使用内置功能,只需多写几行就可以使用常规 dict
而不是 defaultdict
并编写您自己的 Counter
,类似于您在 isanagram
方法中的方法,只是没有比较部分。
您的解决方案很慢,因为您没有利用 python 的数据结构。
这是一个在字典中收集结果的解决方案:
class Solution:
def anagrams(self, strs):
d = {}
for word in strs:
key = tuple(sorted(word))
try:
d[key].append(word)
except KeyError:
d[key] = [word]
return [w for ws in d.values() for w in ws if len(ws) > 1]
如果您在 C# 中使用 Linq
,则只需一行代码即可完成相同的操作
字符串[] = 字符串; // 输入字符串数组
var result = strs.GroupBy(x => new string(x.ToCharArray().OrderBy(z => z).ToArray())).Select(g = > g.ToList()).ToList();
现在要在 Python 中对 Anagrams 进行分组,我们必须: 对列表进行排序。然后,创建字典。现在字典会告诉我们那些字谜在哪里(字典索引)。那么字典的值就是字谜的实际索引。
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
输出:
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']
本题参考this problem on lintcode。我有一个可行的解决方案,但对于庞大的测试用例来说,它花费的时间太长了。我想知道如何改进它?也许我可以减少在外循环中进行比较的次数。
class Solution:
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
@staticmethod
def isanagram(s, t):
if len(s)!=len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
if chars[i]<0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
更新: 只是为了补充,而不是寻找内置的 pythonic 解决方案,例如使用已经优化的 Counter
。已经添加了Mike的建议,还是超时了
为什么不是这个?
str1 = "cafe"
str2 = "face"
def isanagram(s1,s2):
return all(sorted(list(str1)) == sorted(list(str2)))
if isanagram(str1, str2):
print "Woo"
跳过您已经放入集合中的字符串。不要再测试它们了。
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
# write your code here
ret=set()
for i in range(0,len(strs)):
for j in range(i+1,len(strs)):
# If both anagrams exist in set, there is no need to compare them.
if i in ret and j in ret:
continue
if Solution.isanagram(strs[i],strs[j]):
ret.add(i)
ret.add(j)
return [strs[i] for i in list(ret)]
您还可以在遍历字母之前在变位词测试中进行长度比较。只要字符串的长度不同,它们就不可能是变位词。此外,当 chars
中的计数器在比较 t 中的值时达到 -1 时,只是 return 错误。不要再次遍历 chars
。
@staticmethod
def isanagram(s, t):
# Test strings are the same length
if len(s) != len(t):
return False
chars={}
for i in s:
if i in chars:
chars[i]+=1
else:
chars[i]=1
for i in t:
if i not in chars:
return False
else:
chars[i]-=1
# If this is below 0, return false
if chars[i] < 0:
return False
for i in chars:
if chars[i]!=0:
return False
return True
作为对@Mike 出色回答的补充,这里有一个很好的 Pythonic 方法:
import collections
class Solution:
# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
patterns = Solution.find_anagram_words(strs)
return [word for word in strs if ''.join(sorted(word)) in patterns]
@staticmethod
def find_anagram_words(strs):
anagrams = collections.Counter(''.join(sorted(word)) for word in strs)
return {word for word, times in anagrams.items() if times > 1}
您可以创建一个字典(或 collections.defaultdict
) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter
,而不是比较所有的字符串对。之后,您只需要从该字典中获取值。如果您想要所有单词都是任何字谜换句话说,只需合并具有多个条目的列表。
strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)
for s in strings:
anagrams[frozenset(Counter(s).items())].append(s)
print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']
当然,如果您不想使用内置功能,只需多写几行就可以使用常规 dict
而不是 defaultdict
并编写您自己的 Counter
,类似于您在 isanagram
方法中的方法,只是没有比较部分。
您的解决方案很慢,因为您没有利用 python 的数据结构。
这是一个在字典中收集结果的解决方案:
class Solution:
def anagrams(self, strs):
d = {}
for word in strs:
key = tuple(sorted(word))
try:
d[key].append(word)
except KeyError:
d[key] = [word]
return [w for ws in d.values() for w in ws if len(ws) > 1]
如果您在 C# 中使用 Linq
,则只需一行代码即可完成相同的操作字符串[] = 字符串; // 输入字符串数组
var result = strs.GroupBy(x => new string(x.ToCharArray().OrderBy(z => z).ToArray())).Select(g = > g.ToList()).ToList();
现在要在 Python 中对 Anagrams 进行分组,我们必须: 对列表进行排序。然后,创建字典。现在字典会告诉我们那些字谜在哪里(字典索引)。那么字典的值就是字谜的实际索引。
def groupAnagrams(words):
# sort each word in the list
A = [''.join(sorted(word)) for word in words]
dict = {}
for indexofsamewords, names in enumerate(A):
dict.setdefault(names, []).append(indexofsamewords)
print(dict)
#{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
for index in dict.values():
print([words[i] for i in index])
if __name__ == '__main__':
# list of words
words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
"Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
"jayiProti", "North Calcutta", "Sovabazaar"]
groupAnagrams(words)
输出:
['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']