使用 Python 优化 Wordle Bot - 搜索包含 a、b 和 c 的单词？

Question

我一直在努力编写一个 Wordle 机器人，想看看它如何处理所有 13,000 个单词。问题是我运行通过 for 循环对此进行了处理，效率非常低。在运行持续 30 分钟后，它只达到 5% 左右。我可以等那么久，但最终会等上 10 多个小时。必须有一种更有效的方法。我是 python 的新手，如有任何建议，我们将不胜感激。

这里的代码是用来限制每次猜测的代码。有没有一种方法可以搜索包含“a”、“b”和“c”的单词？而不是运行将其分开 3 次。现在每次我需要搜索新字母时，containts、nocontains 和 isletter 都会运行。一起搜索会大大减少时间。

#Find the words that only match the criteria
def contains(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter in x:
                if letter == x[place]:
                    removed.append(x)
                else:
                    list.append(x)
            else:
                removed.append(x)
def nocontains(letter):
    list.clear()
    for x in words:
        if x not in removed:
            if letter not in x:
                list.append(x)
            else:
                removed.append(x)
def isletter(letter, place):
    list.clear()
    for x in words:
        if x not in removed:
            if letter == x[place]:
                list.append(x)
            else:
                removed.append(x)

Answer 1

我刚刚写了一个 wordle 机器人，它运行大约一秒钟，包括网络抓取以获取 5 个字母的单词列表。

import urllib.request
from bs4 import BeautifulSoup

def getwords():
    source = "https://www.thefreedictionary.com/5-letter-words.htm"
    filehandle = urllib.request.urlopen(source)
    soup = BeautifulSoup(filehandle.read(), "html.parser")
    wordslis = soup.findAll("li", {"data-f": "15"})
    words = []
    for k in wordslis:
        words.append(k.getText())
    return words

words = getwords()

def hasLetterAtPosition(letter,position,word):
    return letter==word[position]

def hasLetterNotAtPosition(letter,position,word):
    return letter in word[:position]+word[position+1:]

def doesNotHaveLetter(letter,word):
    return not letter in word

lettersPositioned = [(0,"y")]
lettersMispositioned = [(0,"h")]
lettersNotHad = ["p"]

idx = 0
while idx<len(words):
    eliminated = False
    for criteria in lettersPositioned:
        if not hasLetterAtPosition(criteria[1],criteria[0],words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    for criteria in lettersMispositioned:
        if not hasLetterNotAtPosition(criteria[1],criteria[0],words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    for letter in lettersNotHad:
        if not doesNotHaveLetter(letter,words[idx]):
            del words[idx]
            eliminated = True
            break
    if eliminated:
        continue
    idx+=1

print(words) # ["youth"]

你的速度慢的原因是因为除了检查每个检查的所有单词之外，还有许多多余的逻辑条件，你有很多调用来检查是否删除了单词。

编辑：这是获取更多单词的获取单词函数。

def getwords():
    source = "https://wordfind-com.translate.goog/length/5-letter-words/?_x_tr_sl=es&_x_tr_tl=en&_x_tr_hl=en&_x_tr_pto=wapp"
    filehandle = urllib.request.urlopen(source)
    soup = BeautifulSoup(filehandle.read(), "html.parser")
    wordslis = soup.findAll("a", {"rel": "nofollow"})
    words = []
    for k in wordslis:
        words.append(k.getText())
    return words

Answer 2

使用 sets 可以大大减少性能问题。任何时候你想重复测试成员资格（即使只有几次），例如if x not in removed，你想尝试做一套。列表需要检查每个元素以找到 x，如果列表有数千个元素，这是不好的。在 Python 集合中，如果 removed 有 100 个元素或 100,000，if x not in removed 应该花费与运行一样长的时间，这是一个小的常量时间.

除此之外，您运行试图在任何地方使用可变全局变量都会遇到问题，例如 list（需要重命名）和 removed。这样做没有任何好处，也有一些缺点，例如使您的代码更难推理或优化。 Python 的一个好处是您可以将大型容器或对象传递给函数而无需任何额外时间或 space 成本：调用函数 f(huge_list) 与 [=] 一样快并且使用同样多的内存20=]，就像你在其他语言中通过引用传递一样，所以不要犹豫使用容器作为函数参数或 return 类型。

总而言之，如果您删除 'list' 和 'removed' 并将其存储为 set 个可能的词，那么您的代码将如何重构：

all_words = []  # Huge word list to read in from text file
current_possible_words = set(all_words)

def contains_only_elsewhere(possible_words, letter, place):
    """Given letter and place, remove from possible_words
     all words containing letter but not at place"""
    to_remove = {word for word in possible_words
                 if letter not in word or word[place] == letter}
    return possible_words - to_remove

def must_not_contain(possible_words, letter):
    """Given a letter, remove from possible_words all words containing letter"""
    to_remove = {word for word in possible_words
                 if letter in word}
    return possible_words - to_remove

def exact_letter_match(possible_words, letter, place):
    """Given a letter and place, remove from possible_words
     all words not containing letter at place"""
    to_remove = {word for word in possible_words
                 if word[place] != letter}
    return possible_words - to_remove

外部代码会有所不同：例如，

current_possible_words = exact_letter_match(current_possible_words, 'a', 2)`

可以进一步优化（现在更容易）：只存储单词的索引而不是字符串；为每个字母预先计算包含该字母的所有单词的集合等

使用 Python 优化 Wordle Bot - 搜索包含 a、b 和 c 的单词？

Optimizing Wordle Bot with Python - Search for a word that contains a, b, and c?

python

optimization

wordle-game