Words.word() 来自 nltk 语料库似乎包含奇怪的无效词

Words.word() from nltk corpus seemingly contains strange non-valid words

此代码循环遍历 nltk 库中 word.words() 中的每个单词，然后将单词压入数组。然后它会使用同一个库检查数组中的每个单词，看看它是否是一个真实的单词，不知何故，许多单词都是根本不真实的奇怪单词，比如 "adighe"。这是怎么回事？

import nltk
from nltk.corpus import words

test_array = []
for i in words.words():
    i = i.lower()
    test_array.append(i)

for i in test_array:
    if i not in words.words():
        print(i)

我不认为这里有什么神秘的事情发生。我找到的第一个这样的例子是"Aani"、"the dog-headed ape sacred to the Egyptian god Thoth"。由于它是一个专有名词，"Aani" 在单词列表中而 "aani" 不在。

根据dictionary.com，"Adighe"是"Adygei"的另一种拼写，是另一个专有名词，意思是俄罗斯的一个地区。因为它也是一种语言，我想你可能会争辩说 "adighe" 也应该被允许。这个特定的单词列表会争辩说它不应该。