AttributeError : lower not found
AttributeError : lower not found
我在做文档分类,准确率高达 76%。在预测文档类别时,我做了以下一个
doc_clf.predict(tf_idf.transform((count_vect.transform([r'document']))))
我收到以下错误:
File "/usr/local/lib/python3.5/dist- packages/sklearn/utils/metaestimators.py", line 115, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/sklearn/pipeline.py", line 306, in predict
Xt = transform.transform(Xt)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 923, in transform
_, X = self._count_vocab(raw_documents, fixed_vocab=True)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab
for feature in analyze(doc):
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 647, in __getattr__
raise AttributeError(attr + " not found")
我该如何纠正这个错误?还有其他进一步提高准确性的方法吗?
我分享 link 以查看完整代码
Full Code
在您的代码中,doc_clf
是一个管道。因此 tf_idf.transform()
和 count_vect.transform()
将由管道自动处理。
你应该只调用
category = doc_clf.predict([r'document'])
当这个文档通过管道时,它会被 CountVectorizer 和 TfidfTransformer 自动转换。
我在做文档分类,准确率高达 76%。在预测文档类别时,我做了以下一个
doc_clf.predict(tf_idf.transform((count_vect.transform([r'document']))))
我收到以下错误:
File "/usr/local/lib/python3.5/dist- packages/sklearn/utils/metaestimators.py", line 115, in <lambda>
out = lambda *args, **kwargs: self.fn(obj, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/sklearn/pipeline.py", line 306, in predict
Xt = transform.transform(Xt)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 923, in transform
_, X = self._count_vocab(raw_documents, fixed_vocab=True)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 792, in _count_vocab
for feature in analyze(doc):
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 266, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
File "/usr/local/lib/python3.5/dist-packages/sklearn/feature_extraction/text.py", line 232, in <lambda>
return lambda x: strip_accents(x.lower())
File "/usr/local/lib/python3.5/dist-packages/scipy/sparse/base.py", line 647, in __getattr__
raise AttributeError(attr + " not found")
我该如何纠正这个错误?还有其他进一步提高准确性的方法吗?
我分享 link 以查看完整代码 Full Code
在您的代码中,doc_clf
是一个管道。因此 tf_idf.transform()
和 count_vect.transform()
将由管道自动处理。
你应该只调用
category = doc_clf.predict([r'document'])
当这个文档通过管道时,它会被 CountVectorizer 和 TfidfTransformer 自动转换。