找到一组是字谜的字符串

Find group of strings that are anagrams

本题参考this problem on lintcode。我有一个可行的解决方案,但对于庞大的测试用例来说,它花费的时间太长了。我想知道如何改进它?也许我可以减少在外循环中进行比较的次数。

class Solution:
    # @param strs: A list of strings
    # @return: A list of strings
    def anagrams(self, strs):
        # write your code here
        ret=set()
        for i in range(0,len(strs)):
            for j in range(i+1,len(strs)):
                if i in ret and j in ret:
                    continue
                if Solution.isanagram(strs[i],strs[j]):
                    ret.add(i)
                    ret.add(j)

        return [strs[i] for i in list(ret)]


    @staticmethod
    def isanagram(s, t):
        if len(s)!=len(t):
            return False
        chars={}
        for i in s:
            if i in chars:
                chars[i]+=1
            else:
                chars[i]=1

        for i in t:
            if i not in chars:
                return False
            else:
                chars[i]-=1
                if chars[i]<0:
                    return False

        for i in chars:
            if chars[i]!=0:
                return False
        return True

更新: 只是为了补充,而不是寻找内置的 pythonic 解决方案,例如使用已经优化的 Counter。已经添加了Mike的建议,还是超时了

为什么不是这个?

str1 = "cafe"
str2 = "face"
def isanagram(s1,s2):
    return all(sorted(list(str1)) == sorted(list(str2)))

if isanagram(str1, str2):
    print "Woo"

跳过您已经放入集合中的字符串。不要再测试它们了。

# @param strs: A list of strings
# @return: A list of strings
def anagrams(self, strs):
    # write your code here
    ret=set()
    for i in range(0,len(strs)):
        for j in range(i+1,len(strs)):

            # If both anagrams exist in set, there is no need to compare them.
            if i in ret and j in ret:
                continue

            if Solution.isanagram(strs[i],strs[j]):
                ret.add(i)
                ret.add(j)

    return [strs[i] for i in list(ret)]

您还可以在遍历字母之前在变位词测试中进行长度比较。只要字符串的长度不同,它们就不可能是变位词。此外,当 chars 中的计数器在比较 t 中的值时达到 -1 时,只是 return 错误。不要再次遍历 chars

@staticmethod
def isanagram(s, t):
    # Test strings are the same length
    if len(s) != len(t):
        return False

    chars={}
    for i in s:
        if i in chars:
            chars[i]+=1
        else:
            chars[i]=1

    for i in t:
        if i not in chars:
            return False
        else:
            chars[i]-=1
            # If this is below 0, return false
            if chars[i] < 0:
                return False

    for i in chars:
        if chars[i]!=0:
            return False
    return True

作为对@Mike 出色回答的补充,这里有一个很好的 Pythonic 方法:

import collections


class Solution:
    # @param strs: A list of strings
    # @return: A list of strings
    def anagrams(self, strs):
        patterns = Solution.find_anagram_words(strs)
        return [word for word in strs if ''.join(sorted(word)) in patterns]

    @staticmethod
    def find_anagram_words(strs):
        anagrams = collections.Counter(''.join(sorted(word)) for word in strs)
        return {word for word, times in anagrams.items() if times > 1}

您可以创建一个字典(或 collections.defaultdict) mapping each of the letter-counts to the words having those counts. For getting the letter-counts, you can use collections.Counter,而不是比较所有的字符串对。之后,您只需要从该字典中获取值。如果您想要所有单词都是任何字谜换句话说,只需合并具有多个条目的列表。

strings = ["cat", "act", "rat", "hut", "tar", "tact"]
anagrams = defaultdict(list)

for s in strings:
    anagrams[frozenset(Counter(s).items())].append(s)

print([v for v in anagrams.values()])
# [['hut'], ['rat', 'tar'], ['cat', 'act'], ['tact']]
print([x for v in anagrams.values() if len(v) > 1 for x in v])
# ['cat', 'act', 'rat', 'tar']

当然,如果您不想使用内置功能,只需多写几行就可以使用常规 dict 而不是 defaultdict 并编写您自己的 Counter,类似于您在 isanagram 方法中的方法,只是没有比较部分。

您的解决方案很慢,因为您没有利用 python 的数据结构。

这是一个在字典中收集结果的解决方案:

class Solution:
    def anagrams(self, strs):
        d = {}
        for word in strs:
            key = tuple(sorted(word))
            try:
                d[key].append(word)
            except KeyError:
                d[key] = [word]
        return [w for ws in d.values() for w in ws if len(ws) > 1]

如果您在 C# 中使用 Linq

,则只需一行代码即可完成相同的操作

字符串[] = 字符串; // 输入字符串数组

var result = strs.GroupBy(x => new string(x.ToCharArray().OrderBy(z => z).ToArray())).Select(g = > g.ToList()).ToList();

现在要在 Python 中对 Anagrams 进行分组,我们必须: 对列表进行排序。然后,创建字典。现在字典会告诉我们那些字谜在哪里(字典索引)。那么字典的值就是字谜的实际索引。


def groupAnagrams(words):
 
    # sort each word in the list
    A = [''.join(sorted(word)) for word in words]
    dict = {}
    for indexofsamewords, names in enumerate(A):
     dict.setdefault(names, []).append(indexofsamewords)
    print(dict)
    #{'AOOPR': [0, 2, 5, 11, 13], 'ABTU': [1, 3, 4], 'Sorry': [6], 'adnopr': [7], 'Sadioptu': [8, 16], ' KPaaehiklry': [9], 'Taeggllnouy': [10], 'Leov': [12], 'Paiijorty': [14, 18], 'Paaaikpr': [15], 'Saaaabhmryz': [17], ' CNaachlortttu': [19], 'Saaaaborvz': [20]}
 
    for index in dict.values():
     print([words[i] for i in index])
 

if __name__ == '__main__':
 
    # list of words
    words = ["ROOPA","TABU","OOPAR","BUTA","BUAT" , "PAROO","Soudipta",
        "Kheyali Park", "Tollygaunge", "AROOP","Love","AOORP", "Protijayi","Paikpara","dipSouta","Shyambazaar",
        "jayiProti", "North Calcutta", "Sovabazaar"]
 
    groupAnagrams(words)

输出:


['ROOPA', 'OOPAR', 'PAROO', 'AROOP', 'AOORP']
['TABU', 'BUTA', 'BUAT']
['Soudipta', 'dipSouta']
['Kheyali Park']
['Tollygaunge']
['Love']
['Protijayi', 'jayiProti']
['Paikpara']
['Shyambazaar']
['North Calcutta']
['Sovabazaar']