将一个字符串（任意顺序）匹配到一个巨大数组中的字符串

Question

def Get_Word_List(File_=[]):
    with open("Words.txt") as File: #File of 250k+ words separated via new line 
        for Line in File:
            File_.append(Line.replace("\n",""))
    return File_

def Get_Input(Str=str):
    Str = raw_input("Input 7 letters: ")
    while len(Str) != 7:
        Str = raw_input("Input 7 letter: ")
    return Str.upper()

def Find_Words():
    Letters = Get_Input()
    List = Get_Word_List() #An Array of strings, all in uppercase
    for Word in List:
        pass

我正在尝试以任何顺序匹配字符串（最大长度为 7），例如 "ZZAFIEA" 可以将 "FIZZ" 或 "FEZ" 赋给数组中的一个或多个单词大小为 250k+，我找不到办法去做，我已经尝试了各种方法，感谢任何帮助

Answer 1

这是一个很好的解决方案：

from collections import Counter


def counter_is_subset(x, y):
    # If subtracting y from x doesn't return an empty Counter,
    # x is NOT a subset of y.
    return not (x - y)


def find_buildable_words(words, letters):
    letters = Counter(letters)

    for word in words:
        if counter_is_subset(Counter(word), letters):
            yield word


words = ['BLAH', 'FIZZ', 'FEZ', 'FOO', 'FAZE', 'ZEE']
letters = 'ZZAFIEA'

buildable_words = find_buildable_words(words, letters)

for word in buildable_words:
    print(word)

在我的电脑上，这个包含 250,000 个单词的列表运行时间约为 1.2 秒。

Answer 2

您可以使用 itertools.ifilter，编写一个谓词来确认列表中的单词是否包含在您的字符串中，然后运行 ifilter 将您的谓词放在该列表中。

演示：

from itertools import ifilter
compare_against = "ABEFGZ"
lst = ['EFZ', 'ZIP', 'AGA', 'ABM']

def pred(word):
    for char in word:
        if char not in compare_against:
            return False
    return True

x = ifilter(pred, lst)

for y in x:
    print y

输出：

EFZ
AGA

免责声明

这个例子不能很好地处理重复的字符，意思是，根据定义，你可以决定 AGA 是否应该 return 或不（字符 'A' 在 [=15 中只出现一次=]).如果您确定 AGA 不是有效输出，则应修改 pred 函数以适应该限制。

将一个字符串（任意顺序）匹配到一个巨大数组中的字符串

Matching a string (any order) to strings in an array of a huge size

python

arrays

string

string-matching

python-2.7