python 代码的时间复杂度,用于查找可以由列表中的其他单词组成的最长单词(后续)

Time complexity of python code for finding the longest word that can be made of other words in the list (follow-up)

我知道已经提出并解决了类似的问题 (),但我有一个后续问题。

我有以下问题,

# Given a list of words, find the longest word made of other words in the list.
# Example: [cat, banana, dog, nana, walk, walker, dogwalker]
# Result: dogwalker

我在 Python 3 中有以下实现:

def find_longest_word(list_words):
    list_words = sorted(list_words, key=lambda x:len(x))[::-1]
    for elem in list_words:
        set_remaining = set(list_words) - set([elem])
        if find_if_composite(elem, set_remaining, {}):
            return elem
    return None

def find_if_composite(s, set_):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite(s[i:], set_): # assuming that I can reuse the string
                return True
    return False

我们从之前的回答中得知这段代码的运行时间复杂度为 O(N!),其中 N 是字符串的大小(如果我没记错的话)。

我的问题是:有什么方法可以提高时间复杂度(例如,使用记忆)?如果不是,为什么?如果是,如何?我尝试了以下代码,但它似乎在递归调用期间从未触及备忘录。

def find_if_composite_memo(s, set_, memo):
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                return True
    memo[s] = False
    return False

要测试,请使用

b = ["cat", "banana", "dog", "nana", "walk", "walker", "dogwalker", "dogwalkerbanana"]
print(find_longest_word(b))

只有当你有一个完整的解决方案时,你才使用备忘录。 由于您按长度顺序对列表进行排序,因此您的第一个完整解决方案就是所需的答案。 所以,你永远不会用到你的备忘录。

我在你的记忆例程中添加了一点跟踪,并稍微增强了测试:

def find_if_composite_memo(s, set_, memo):
    print "ENTER", s, '\t', memo
    n = len(s)
    if len(set_) == 0 or n == 0:
        return False
    if s in memo:
        print "Found", s, memo[s]
        return memo[s]
    if s in set_:
        return True
    for i in range(1, n+1):
        this_s = s[:i]
        if this_s in set_:
            if find_if_composite_memo(s[i:], set_, memo):
                memo[s] = True
                memo[s[i:]] = True
                print "Add", memo
                return True
    memo[s] = False
    return False

b = ["cat", "banana", "dog", "nana",
     "walk", "walker",
     "dogcat", "dogwalker",
     "dogwalkerbanana",
     "catwalkerbanana-FAIL"]
print(find_longest_word(b))

输出:

ENTER catwalkerbanana-FAIL  {}
ENTER walkerbanana-FAIL     {}
ENTER erbanana-FAIL     {}
ENTER banana-FAIL   {'erbanana-FAIL': False}
ENTER -FAIL     {'erbanana-FAIL': False}
ENTER dogwalkerbanana   {}
ENTER walkerbanana  {}
ENTER erbanana  {}
ENTER banana    {'erbanana': False}
Add {'walkerbanana': True, 'erbanana': False, 'banana': True}
Add {'walkerbanana': True, 'erbanana': False, 'dogwalkerbanana': True, 'banana': True}
dogwalkerbanana

你看到问题了吗?使用 "catwalkerbanana-FAIL",您可以记录 "catwalk"、"catwalker" 和 "walkerbanana",但不要利用它。最后一个可以为您节省一些搜索时间。

至于改进整体算法,你可以考虑SE的code review网站。

如果您使用以下一些方法,我希望您可以做得更好:

(1) When you fail to find the present top-level word **elem** as a composite, you can remove it from the word list entirely.  Don't keep it in set_remaining: since it's at least as long as any other word in the list, it cannot be a component.
(2) In **find_if_composite**, you go through a lot of pain to determine whether any beginning substring of **s** is in the word_list.  I think you can speed this up: use **s.startswith(word)** on each word in the list.  
(3) If the list has enough composite hits to matter, examine your memoization algorithm carefully.  Figure out what is and is not worth saving, and write the pseudo-code to handle only the time-saving parts.  Remember that, in working from largest to smallest, you're going to short-circuit a lot of cases.  In this test set, though, finding "walkerbanana" the first time seen would save a recursion later.

也找找好的地方用其他的string operations,比如index(substring search).