单词列表中最长的单词链

Longest word chain from a list of words

所以,这是我正在尝试制作的功能的一部分。

不想代码太复杂

我有一个单词列表,例如

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

单词链序列的想法是下一个单词以最后一个单词结束的字母开头。

(编辑:每个词不能使用超过一次。除此之外没有其他限制。)

我希望输出给出最长的词链序列,在本例中是:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

我不太确定该怎么做,我尝试过不同的尝试。其中之一...

如果我们从列表中的特定单词开始,此代码会正确找到单词链,例如单词[0](所以'giraffe'):

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []

word_chain.append(words[0])

for word in words:
    for char in word[0]:

       if char == word_chain[-1][-1]:
            word_chain.append(word)

print(word_chain)

输出:

['giraffe', 'elephant', 'tiger', 'racoon']

但是,我想找到最长的单词链(上面已解释)。

我的方法: 所以,我尝试使用我编写的上述工作代码并循环遍历,使用列表中的每个单词作为起点并找到单词链对于每个单词 [0]、单词 [1]、单词 [2] 等。然后我尝试使用 if 语句找到最长的单词链,并将长度与之前最长的链进行比较,但我无法完成正确,我真的不知道这是怎么回事。

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

word_chain = []
max_length = 0
for starting_word_index in range(len(words) - 1):

    word_chain.append(words[starting_word_index])

    for word in words:
        for char in word[0]:

            if char == word_chain[-1][-1]:
                word_chain.append(word)

    # Not sure

    if len(word_chain) > max_length:
        final_word_chain = word_chain
        longest = len(word_chain)
        word_chain.clear()

print(final_word_chain)

这是我的第 n 次尝试,我认为这一次打印了一个空列表,在此之前我有过不同的尝试,但未能正确清除 word_chain 列表并最终再次重复单词。

非常感谢任何帮助。希望我没有让这太乏味或令人困惑......谢谢!

您可以使用递归来探索每个 "branch" 当包含正确初始字符的每个可能的字母被添加到 运行 列表时出现的每个 "branch":

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
def get_results(_start, _current, _seen):
  if all(c in _seen for c in words if c[0] == _start[-1]):
    yield _current
  else:
      for i in words:
        if i[0] == _start[-1]:
          yield from get_results(i, _current+[i], _seen+[i])


new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

这个解决方案的工作原理类似于广度优先搜索,因为函数 get_resuls 将继续遍历整个列表,只要当前值之前没有被调用过。函数看到的值被添加到 _seen 列表中,最终停止递归调用流。

此解决方案还将忽略重复的结果:

words = ['giraffe', 'elephant', 'ant', 'ning', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse',]
new_d = [list(get_results(i, [i], []))[0] for i in words]
final_d = max([i for i in new_d if len(i) == len(set(i))], key=len)

输出:

['ant', 'tiger', 'racoon', 'ning', 'giraffe', 'elephant']

希望有一种无需递归的更直观的方法。遍历列表并让 Python 的排序和列表理解为您完成工作:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chain_longest(pivot, words):
    new_words = []
    new_words.append(pivot)
    for word in words:
        potential_words = [i for i in words if i.startswith(pivot[-1]) and i not in new_words]
        if potential_words:
            next_word = sorted(potential_words, key = lambda x: len)[0]
            new_words.append(next_word)
            pivot = next_word
        else:
            pass
    return new_words

max([chain_longest(i, words) for i in words], key = len)
>>
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

设置一个枢轴并检查 potential_words 它们是否以您的枢轴词开头并且没有出现在您的新单词列表中。如果找到,则只需按长度对它们进行排序并取第一个元素。

列表推导式将每个单词作为一个支点,returns 你是最长的链条。

此函数创建一种称为生成器的迭代器(参见:What does the "yield" keyword do?)。它递归地创建同一生成器的更多实例以探索所有可能的尾部序列:

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']

def chains(words, previous_word=None):
    # Consider an empty sequence to be valid (as a "tail" or on its own):
    yield []
    # Remove the previous word, if any, from consideration, both here and in any subcalls:
    words = [word for word in words if word != previous_word]
    # Take each remaining word...
    for each_word in words:
        # ...provided it obeys the chaining rule
        if not previous_word or each_word.startswith(previous_word[-1]):
            # and recurse to consider all possible tail sequences that can follow this particular word:
            for tail in chains(words, previous_word=each_word):
                # Concatenate the word we're considering with each possible tail:
                yield [each_word] + tail  

all_legal_sequences = list(chains(words))  # convert the output (an iterator) to a list
all_legal_sequences.sort(key=len) # sort the list of chains in increasing order of chain length
for seq in all_legal_sequences: print(seq)
# The last line (and hence longest chain) prints as follows:
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

或者,更有效地直接获得最长的链:

print(max(chains(words), key=len)

最后,这是一个允许在输入中重复单词的替代版本(即,如果您包含一个单词 N 次,您最多可以在链中使用它 N 次):

def chains(words, previous_word_index=None):
    yield []
    if previous_word_index is not None:
        previous_letter = words[previous_word_index][-1]
        words = words[:previous_word_index] + words[previous_word_index + 1:]
    for i, each_word in enumerate( words ):
        if previous_word_index is None or each_word.startswith(previous_letter):
            for tail in chains(words, previous_word_index=i):
                yield [each_word] + tail  

这是一个可行的递归蛮力方法:

def brute_force(pool, last=None, so_far=None):
    so_far = so_far or []
    if not pool:
        return so_far
    candidates = []
    for w in pool:
        if not last or w.startswith(last):
            c_so_far, c_pool = list(so_far) + [w], set(pool) - set([w])
            candidates.append(brute_force(c_pool, w[-1], c_so_far))
    return max(candidates, key=len, default=so_far)

>>> brute_force(words)
['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

在每次递归调用时,这会尝试从剩余池中的每个符合条件的单词继续链。然后它选择最长的这样的延续。

另一个使用递归方法的答案:

def word_list(w_list, remaining_list):
    max_result_len=0
    res = w_list
    for word_index in range(len(remaining_list)):
        # if the last letter of the word list is equal to the first letter of the word
        if w_list[-1][-1] == remaining_list[word_index][0]:
            # make copies of the lists to not alter it in the caller function
            w_list_copy = w_list.copy()
            remaining_list_copy = remaining_list.copy()
            # removes the used word from the remaining list
            remaining_list_copy.pop(word_index)
            # append the matching word to the new word list
            w_list_copy.append(remaining_list[word_index])
            res_aux = word_list(w_list_copy, remaining_list_copy)
            # Keep only the longest list
            res = res_aux if len(res_aux) > max_result_len else res 
    return res

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
word_list(['dog'], words)

输出:

['dog', 'giraffe', 'elephant', 'tiger', 'racoon']

本着暴力解决方案的精神,您可以检查 words 列表的所有排列并选择最佳的连续开始序列:

from itertools import permutations

def continuous_starting_sequence(words):
    chain = [words[0]]
    for i in range(1, len(words)):
        if not words[i].startswith(words[i - 1][-1]):
            break
        chain.append(words[i])
    return chain

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat', 'hedgehog', 'mouse']
best = max((continuous_starting_sequence(seq) for seq in permutations(words)), key=len)

print(best)
# ['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

由于我们考虑的是所有排列,我们知道必须有一个以最大词链开头的排列。

这当然有 O(n n!) 时间复杂度 :D

对于这个问题,我有一个基于树的方法,它可能会更快。我仍在努力实现代码,但这是我要做的:

    1. Form a tree with the root node as first word. 
    2. Form the branches if there is any word or words that starts 
with the alphabet with which this current word ends.
    3. Exhaust the entire given list based on the ending alphabet
 of current word and form the entire tree.
    4. Now just find the longest path of this tree and store it.
    5. Repeat steps 1 to 4 for each of the words given in the list
 and print the longest path among the longest paths we got above.

我希望这可以提供更好的解决方案,以防给出大量单词。我会用实际的代码实现来更新它。

我有个新想法,如图:

我们可以通过word[0] == word[-1]构造一个有向图,然后将问题转化为求最大长度路径

正如其他人所说,问题是找到longest path in a directed acyclic graph

对于 Python 中相关的任何图形,networkx 是你的朋友。

您只需要初始化图形、添加节点、添加边并启动 dag_longest_path:

import networkx as nx
import matplotlib.pyplot as plt

words = ['giraffe', 'elephant', 'ant', 'tiger', 'racoon', 'cat',
         'hedgehog', 'mouse']

G = nx.DiGraph()
G.add_nodes_from(words)

for word1 in words:
    for word2 in words:
        if word1 != word2 and word1[-1] == word2[0]:
            G.add_edge(word1, word2)
nx.draw_networkx(G)
plt.show()
print(nx.algorithms.dag.dag_longest_path(G))

它输出:

['hedgehog', 'giraffe', 'elephant', 'tiger', 'racoon']

注意:此算法仅在图中没有循环(循环)的情况下才有效。这意味着它会因 ['ab', 'ba'] 而失败,因为会有一条无限长的路径:['ab', 'ba', 'ab', 'ba', 'ab', 'ba', ...]