how to resolve the error: AttributeError: 'generator' object has no attribute 'endswith'
how to resolve the error: AttributeError: 'generator' object has no attribute 'endswith'
当我尝试 运行 这段代码来预处理文本时,出现以下错误,有人遇到了类似的问题,但 post 没有足够的详细信息。
我在这里将所有内容都放在上下文中,希望能帮助审稿人更好地帮助我们。
这是函数;
def preprocessing(text):
#text=text.decode("utf8")
#tokenize into words
tokens=[word for sent in nltk.sent_tokenize(text) for word in
nltk.word_tokenize(sent)]
#remove stopwords
stop=stopwords.words('english')
tokens=[token for token in tokens if token not in stop]
#remove words less than three letters
tokens=[word for word in tokens if len(word)>=3]
#lower capitalization
tokens=[word.lower() for word in tokens]
#lemmatization
lmtzr=WordNetLemmatizer()
tokens=[lmtzr.lemmatize(word for word in tokens)]
preprocessed_text=' '.join(tokens)
return preprocessed_text
在这里调用函数;
#open the text data from disk location
sms=open('C:/Users/Ray/Documents/BSU/Machine_learning/Natural_language_Processing_Pyhton_And_NLTK_Chap6/smsspamcollection/SMSSpamCollection')
sms_data=[]
sms_labels=[]
csv_reader=csv.reader(sms,delimiter='\t')
for line in csv_reader:
#adding the sms_id
sms_labels.append(line[0])
#adding the cleaned text by calling the preprocessing method
sms_data.append(preprocessing(line[1]))
sms.close()
结果;
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-38-b42d443adaa6> in <module>()
8 sms_labels.append(line[0])
9 #adding the cleaned text by calling the preprocessing method
---> 10 sms_data.append(preprocessing(line[1]))
11 sms.close()
<ipython-input-37-69ef4cd83745> in preprocessing(text)
12 #lemmatization
13 lmtzr=WordNetLemmatizer()
---> 14 tokens=[lmtzr.lemmatize(word for word in tokens)]
15 preprocessed_text=' '.join(tokens)
16 return preprocessed_text
~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
38
39 def lemmatize(self, word, pos=NOUN):
---> 40 lemmas = wordnet._morphy(word, pos)
41 return min(lemmas, key=len) if lemmas else word
42
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in
_morphy(self, form, pos, check_exceptions) 1798 1799 # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1800 forms = apply_rules([form]) 1801 1802 # 2. Return all that are in the database (and check the original too)
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms) 1777 def apply_rules(forms): 1778 return [form[:-len(old)] + new
-> 1779 for form in forms 1780 for old, new in substitutions 1781 if form.endswith(old)]
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0) 1779 for form in forms 1780 for old, new in substitutions
-> 1781 if form.endswith(old)] 1782 1783 def filter_forms(forms):
AttributeError: 'generator' object has no attribute 'endswith'
我认为错误来自 nltk.corpus.reader.wordnet
的源代码
完整的源代码可以在nltk文档页面看到。这里post太长了;但下面是原始的 link:
感谢您的帮助。
错误消息和回溯指向问题的根源:
in preprocessing(text) 12 #lemmatization 13 lmtzr=WordNetLemmatizer()
---> 14 tokens=[lmtzr.lemmatize(word for word in tokens)] 15 preprocessed_text=' '.join(tokens) 16 return preprocessed_text
~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self,
word, pos) 38 39 def lemmatize(self, word, pos=NOUN):
很明显,从函数的签名(word
,而不是words
)和错误("has no attribute 'endswith'" - endswith()
实际上是一个str
方法) , lemmatize()
需要一个单词,但这里:
tokens=[lmtzr.lemmatize(word for word in tokens)]
您正在传递生成器表达式。
你想要的是:
tokens = [lmtzr.lemmatize(word) for word in tokens]
注意:您提到:
I believe the error is coming from the source code for
nltk.corpus.reader.wordnet
此包中确实 引发了 错误,但它 "is coming from"(在 "caused by" 的意义上)您的代码传递了错误的参数;)
希望这对您下次自行调试此类问题有所帮助。
当我尝试 运行 这段代码来预处理文本时,出现以下错误,有人遇到了类似的问题,但 post 没有足够的详细信息。
我在这里将所有内容都放在上下文中,希望能帮助审稿人更好地帮助我们。
这是函数;
def preprocessing(text):
#text=text.decode("utf8")
#tokenize into words
tokens=[word for sent in nltk.sent_tokenize(text) for word in
nltk.word_tokenize(sent)]
#remove stopwords
stop=stopwords.words('english')
tokens=[token for token in tokens if token not in stop]
#remove words less than three letters
tokens=[word for word in tokens if len(word)>=3]
#lower capitalization
tokens=[word.lower() for word in tokens]
#lemmatization
lmtzr=WordNetLemmatizer()
tokens=[lmtzr.lemmatize(word for word in tokens)]
preprocessed_text=' '.join(tokens)
return preprocessed_text
在这里调用函数;
#open the text data from disk location
sms=open('C:/Users/Ray/Documents/BSU/Machine_learning/Natural_language_Processing_Pyhton_And_NLTK_Chap6/smsspamcollection/SMSSpamCollection')
sms_data=[]
sms_labels=[]
csv_reader=csv.reader(sms,delimiter='\t')
for line in csv_reader:
#adding the sms_id
sms_labels.append(line[0])
#adding the cleaned text by calling the preprocessing method
sms_data.append(preprocessing(line[1]))
sms.close()
结果;
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-38-b42d443adaa6> in <module>()
8 sms_labels.append(line[0])
9 #adding the cleaned text by calling the preprocessing method
---> 10 sms_data.append(preprocessing(line[1]))
11 sms.close()
<ipython-input-37-69ef4cd83745> in preprocessing(text)
12 #lemmatization
13 lmtzr=WordNetLemmatizer()
---> 14 tokens=[lmtzr.lemmatize(word for word in tokens)]
15 preprocessed_text=' '.join(tokens)
16 return preprocessed_text
~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos)
38
39 def lemmatize(self, word, pos=NOUN):
---> 40 lemmas = wordnet._morphy(word, pos)
41 return min(lemmas, key=len) if lemmas else word
42
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in
_morphy(self, form, pos, check_exceptions) 1798 1799 # 1. Apply rules once to the input to get y1, y2, y3, etc.
-> 1800 forms = apply_rules([form]) 1801 1802 # 2. Return all that are in the database (and check the original too)
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in apply_rules(forms) 1777 def apply_rules(forms): 1778 return [form[:-len(old)] + new
-> 1779 for form in forms 1780 for old, new in substitutions 1781 if form.endswith(old)]
~\Anaconda3\lib\site-packages\nltk\corpus\reader\wordnet.py in <listcomp>(.0) 1779 for form in forms 1780 for old, new in substitutions
-> 1781 if form.endswith(old)] 1782 1783 def filter_forms(forms):
AttributeError: 'generator' object has no attribute 'endswith'
我认为错误来自 nltk.corpus.reader.wordnet
的源代码完整的源代码可以在nltk文档页面看到。这里post太长了;但下面是原始的 link:
感谢您的帮助。
错误消息和回溯指向问题的根源:
in preprocessing(text) 12 #lemmatization 13 lmtzr=WordNetLemmatizer() ---> 14 tokens=[lmtzr.lemmatize(word for word in tokens)] 15 preprocessed_text=' '.join(tokens) 16 return preprocessed_text
~\Anaconda3\lib\site-packages\nltk\stem\wordnet.py in lemmatize(self, word, pos) 38 39 def lemmatize(self, word, pos=NOUN):
很明显,从函数的签名(word
,而不是words
)和错误("has no attribute 'endswith'" - endswith()
实际上是一个str
方法) , lemmatize()
需要一个单词,但这里:
tokens=[lmtzr.lemmatize(word for word in tokens)]
您正在传递生成器表达式。
你想要的是:
tokens = [lmtzr.lemmatize(word) for word in tokens]
注意:您提到:
I believe the error is coming from the source code for nltk.corpus.reader.wordnet
此包中确实 引发了 错误,但它 "is coming from"(在 "caused by" 的意义上)您的代码传递了错误的参数;)
希望这对您下次自行调试此类问题有所帮助。