NLTK 的拼写检查器无法正常工作
NLTK's Spell checker is not working correctly
我想用 NLTK
检查 python 中某个句子的拼写。内置 spell checker
无法正常工作。它给出 with
和 'and' 作为错误的拼写。
def tokens(sent):
return nltk.word_tokenize(sent)
def SpellChecker(line):
for i in tokens(line):
strip = i.rstrip()
if not WN.synsets(strip):
print("Wrong spellings : " +i)
else:
print("No mistakes :" + i)
def removePunct(str):
return "".join(c for c in str if c not in ('!','.',':',','))
l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. "
noPunct = removePunct(l.lower())
if(SpellChecker(noPunct)):
print(l)
print(noPunct)
有人能告诉我原因吗?
拼写错误,因为那些 stopwords
不包含在 wordnet 中(检查 FAQs)
因此,您可以改用 NLTK 语料库中的停用词来检查此类词。
#Add these lines:
import nltk
from nltk.corpus import wordnet as WN
from nltk.corpus import stopwords
stop_words_en = set(stopwords.words('english'))
def tokens(sent):
return nltk.word_tokenize(sent)
def SpellChecker(line):
for i in tokens(line):
strip = i.rstrip()
if not WN.synsets(strip):
if strip in stop_words_en: # <--- Check whether it's in stopword list
print("No mistakes :" + i)
else:
print("Wrong spellings : " +i)
else:
print("No mistakes :" + i)
def removePunct(str):
return "".join(c for c in str if c not in ('!','.',':',','))
l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. "
noPunct = removePunct(l.lower())
if(SpellChecker(noPunct)):
print(l)
print(noPunct)
我想用 NLTK
检查 python 中某个句子的拼写。内置 spell checker
无法正常工作。它给出 with
和 'and' 作为错误的拼写。
def tokens(sent):
return nltk.word_tokenize(sent)
def SpellChecker(line):
for i in tokens(line):
strip = i.rstrip()
if not WN.synsets(strip):
print("Wrong spellings : " +i)
else:
print("No mistakes :" + i)
def removePunct(str):
return "".join(c for c in str if c not in ('!','.',':',','))
l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. "
noPunct = removePunct(l.lower())
if(SpellChecker(noPunct)):
print(l)
print(noPunct)
有人能告诉我原因吗?
拼写错误,因为那些 stopwords
不包含在 wordnet 中(检查 FAQs)
因此,您可以改用 NLTK 语料库中的停用词来检查此类词。
#Add these lines:
import nltk
from nltk.corpus import wordnet as WN
from nltk.corpus import stopwords
stop_words_en = set(stopwords.words('english'))
def tokens(sent):
return nltk.word_tokenize(sent)
def SpellChecker(line):
for i in tokens(line):
strip = i.rstrip()
if not WN.synsets(strip):
if strip in stop_words_en: # <--- Check whether it's in stopword list
print("No mistakes :" + i)
else:
print("Wrong spellings : " +i)
else:
print("No mistakes :" + i)
def removePunct(str):
return "".join(c for c in str if c not in ('!','.',':',','))
l = "Attempting artiness With black & white and clever camera angles, the movie disappointed - became even more ridiculous - as the acting was poor and the plot and lines almost non-existent. "
noPunct = removePunct(l.lower())
if(SpellChecker(noPunct)):
print(l)
print(noPunct)