离线词典程序:查找相似的单词以及开头相同的单词

Offline dictionary program: find both similar words as well as words that begin the same

我写了这个离线词典程序。我希望当用户按下一个键时,该程序进入数据库并找到一个与用户到目前为止输入的单词接近的单词。或者当用户完整地输入一个单词并且该单词在数据库中时,程序将显示它及其含义。

这部分一切顺利。然后我想要例如当用户输入单词 "a" 时,程序显示数据库中以 "a".

开头的所有单词

这是我的问题的一个例子:当我们输入 "a" 时,应该显示所有以 "a" 开头的单词和含义。但是程序显示如下:

这是我的一些 json 格式的数据库:

{"apple": ["Apple", "apple", "Sib", "Apfel", "Des pommes"], "average": ["Average", "average", "Miangin", "Durchschnitt", "Des pommes"], "acknowledge": ["Acknowledge", "acknowledge", "Tasdigh Kardan", "Zu bestatigen", "Pour reconnaître"], "book": ["Book", "book", "Ketab", "Buch", "Livre"], "banana": ["Banana", "banana", "Mouz", "Bananen", "Bananes"], "beach grass": ["Beach Grass", "beach grass", "Chamane Sahel", "Strandhafer", "herbe de plage"], "cat": ["Cat", "cat", "Gorbe", "Katzen", "chatte"], "certificate": ["Certificate", "certificate", "Govahi Name", "Zertifikat", "certificat"], "declaration of conformity": ["Declaration Of Conformity", "declaration of conformity", "Elamie Entebagh", "Konformitatserklarung", "déclaration de conformité"], "database": ["Database", "database", "Paygah Dade", "Datenbank", "base de données"], "dear colleagues": ["Dear Colleagues", "dear colleagues", "Hamkarane Aziz", "Liebe Mitarbeiterinnen und Mitarbeiter", "Chers collègues"]}

在这本词典中,每个单词都有英语、波斯语、法语和德语的含义。

你可以在下面看到我的代码:

import json
import msvcrt
import os 
from difflib import get_close_matches

DataBase = json.load(open("DataBase.json"))

def getMeaning(w):

    w = w.lower()
    n = len(w)

    if w in DataBase:
        return DataBase[w]

    elif len(get_close_matches(w,DataBase.keys(),1,0.3)) > 0:
        close_match = get_close_matches(w,DataBase.keys(),1,0.3)[0]
        print("Not Found!\nCheck The Close Match:\n")
        return DataBase[close_match]

    else:
        print ("Not Found!\n")
        res = [value for key, value in DataBase.items()]
        for i in res:
            for j in i:
                if w in j[0:n].lower(): 
                     print(j)
        return ''

word = '' 
while True:
    if msvcrt.kbhit():
        temp = msvcrt.getwch()
        word += temp
        os.system('cls')
        print(word)
        print("\n")
        meaning = getMeaning(word)
        for item in meaning:
            print(item)

请注意,由于 msvcrt.kbhit()

,您必须 运行 CMD 中的此程序才能正常工作

如果有人输入 a,您将调用 getMeaning,后者又会调用 get_close_matches。然后,您将检查该调用是否具有非零长度 return 值,如果有,您将执行 return DataBase[close_match]getMeaning 到此结束。

如果 get_close_matches 产生结果,您将永远无法达到 getMeaningelse 部分。在您的问题的屏幕截图中,我们可以看到用户输入 a 的结果,这是有意义的,因为 get_close_matches 发现 cat 类似于 a

尽管如此,您应该使用 startswith if you want to test if a string begins with another string. Also, you don't need elif or else after the previous if or elif has a return and I have changed the names according to PEP 8 section Descriptive Naming Styles

这是一个可能的解决方案,使用一个过滤器,如果字母与 word 中的字母相同,则只接受接近匹配:

from difflib import get_close_matches

database = {"apple": ["Apple", "apple", "Sib", "Apfel", "Des pommes"], "average": ["Average", "average", "Miangin", "Durchschnitt", "Des pommes"], "acknowledge": ["Acknowledge", "acknowledge", "Tasdigh Kardan", "Zu bestatigen", "Pour reconnaître"], "book": ["Book", "book", "Ketab", "Buch", "Livre"], "banana": ["Banana", "banana", "Mouz", "Bananen", "Bananes"], "beach grass": ["Beach Grass", "beach grass", "Chamane Sahel", "Strandhafer", "herbe de plage"], "cat": ["Cat", "cat", "Gorbe", "Katzen", "chatte"], "certificate": ["Certificate", "certificate", "Govahi Name", "Zertifikat", "certificat"], "declaration of conformity": ["Declaration Of Conformity", "declaration of conformity", "Elamie Entebagh", "Konformitatserklarung", "déclaration de conformité"], "database": ["Database", "database", "Paygah Dade", "Datenbank", "base de données"], "dear colleagues": ["Dear Colleagues", "dear colleagues", "Hamkarane Aziz", "Liebe Mitarbeiterinnen und Mitarbeiter", "Chers collègues"]}

def get_meaning(word):

    # Make word case-insensitive
    word = word.lower()

    # Check if word already in database
    if word in database:
        return {word: database[word]}

    # Find possible close matches
    close_matches = get_close_matches(word, database.keys(), 1, 0.3)
    # Filter matches: keep only those which contain the same letters
    close_matches = [
        close_match
        for close_match in close_matches
        if set(close_match) == set(word)
    ]
    # Return close matches if any left
    if close_matches:
        return {
            close_match: database[close_match]
            for close_match in close_matches
        }

    # Return all dictionary entries which start with the word
    return {
        entry: database[entry]
        for entry in database
        if entry.startswith(word)
    }

现在 a 不再产生 cat:

>>> get_meaning("a")
{'apple': ['Apple', 'apple', 'Sib', 'Apfel', 'Des pommes'], 'average': ['Average', 'average', 'Miangin', 'Durchschnitt', 'Des pommes'], 'acknowledge': ['Acknowledge', 'acknowledge', 'Tasdigh Kardan', 'Zu bestatigen', 'Pour reconnaître']}

但是 applle 仍然被识别为 apple:

>>> get_meaning("applle")
{'apple': ['Apple', 'apple', 'Sib', 'Apfel', 'Des pommes']}

或者,您可以修改调用 get_close_matches 的参数 cutoff 以获得不同的结果。

get_close_matches 中,可选参数 cutoff 是范围 [0, 1] 中的浮点数。
得分低于单词相似度的可能性将被忽略。

所以我只需要将 get_close_matchescutoff0.3 更改为 0.8
这解决了我的问题。

    elif len(get_close_matches(w,DataBase.keys(),1,0.8)) > 0:
        close_match = get_close_matches(w,DataBase.keys(),1,0.8)[0]
        print("Not Found!\nCheck The Close Match:\n")
        return DataBase[close_match]