如何遍历单词列表并仅保留在 Python 中特定索引处具有特定字母的单词

Question

我有一个包含 200,000 个单词的列表、一个包含索引的列表和一个关键字。 index_list 未预定义，可以是 0 to len(keyword).

之间的任意大小

我希望遍历这200,000个单词，只保留包含关键字中特定索引处的字母的单词。

示例：

keyword = "BEANS" 
indexList = [0, 3]

我想在第 0 个索引和 'N' 和第 3 个索引处保留包含“B”的单词。

keyword = "BEANS"
indexList = [0, 1, 2]

我想在第 0 个索引和 'E' 和第一个索引中保留包含“B”的单词，在第二个索引中保留 'A'。

keyword = "BEANS"
indexList = []

没有特定的词，return全部 200,000 个词

目前，

我有这个代码。 sampleSpace指的是20万字的列表

extractedList = []
for i in range(len(indexList)):
    for word in sampleSpace:      
        if (word[indexList[i]] == keyword[indexList[i]]):
            extractedList.append(word)

但是，此代码正在提取在第一个索引处具有值或在第二个索引处具有值或在第 N 个索引处具有值的词。

我需要单词包含特定索引处的所有字母。

Answer 1

您可以使用 all 进行简单的理解。对大词列表中的所有词进行理解循环，然后使用 all 检查 indexList:

中的所有索引

>>> from wordle_solver import wordle_corpus as corpus
>>> keyword = "BEANS"
>>> indexList = [0, 3]
>>> [word for word in corpus if all(keyword[i] == word[i] for i in indexList)]
['BLAND', 'BRUNT', 'BUNNY', 'BLANK', 'BRINE', 'BLEND', 'BLINK', 'BLUNT', 'BEING', 'BRING', 'BRINY', 'BOUND', 'BLOND', 'BURNT', 'BORNE', 'BRAND', 'BRINK', 'BLIND']

Answer 2

首先，改变你的逻辑，使你的 outer 循环是 for word in sampleSpace。这是因为您想一次考虑每个单词，并查看该单词中的所有相关索引。

接下来，查找 all() function，其中 returns true 如果 all 你给的 iterable 的元素这是真实的。我们如何在这里应用它？我们要检查是否

all(
    word[index] == keyword[index] for index in indexList
)

所以我们有：

extractedWords = []
for word in sampleSpace:
    if all(word[index] == keyword[index] for index in indexList):
        extractedWords.append(word)

现在因为这个循环只是构造一个列表，我们可以像这样把它写成一个列表理解：

extractedWords = [word 
                    for word in sampleSpace 
                    if all(word[index] == keyword[index] for index in indexList)
                 ]

您可以在执行任何操作之前使用 if 条件单独处理空 indexList 的情况。

def search_keyword_index(sampleSpace, keyword, indexList)
    if not indexList:
        return sampleSpace # or return sampleSpace[:] if you need to return a copy

    return [word for word in sampleSpace if all(word[index] == keyword[index] for index in indexList)]

Answer 3

您可以创建一组（索引，字符）并使用它来快速比较列表中的每个单词：

with open("/usr/share/dict/words") as f:
    words = f.read().upper().split('\n') # 235,887 words

keyword   = "ELEPHANT"
indexList = [0, 3, 5, 7]

letterSet = {(i,keyword[i]) for i in indexList}

for word in words:
    if letterSet.issubset(enumerate(word)):
        print(word)

EGGPLANT
ELEPHANT
ELEPHANTA
ELEPHANTIAC
ELEPHANTIASIC
ELEPHANTIASIS
ELEPHANTIC
ELEPHANTICIDE
ELEPHANTIDAE
ELEPHANTINE
ELEPHANTLIKE
ELEPHANTOID
ELEPHANTOIDAL
ELEPHANTOPUS
ELEPHANTOUS
ELEPHANTRY
EPIPLASTRAL
EPIPLASTRON

您可以使用理解将结果放入列表中：

letterSet = {(i,keyword[i]) for i in indexList}    
eligible  = [word for word in words if letterSet.issubset(enumerate(word))]

print(len(eligible)) # 18

如何遍历单词列表并仅保留在 Python 中特定索引处具有特定字母的单词

How to loop through a list of words and only keep the ones that have specific letters at specific indexes in Python

python

indexing

for-loop

list

data-extraction