TfidfTransformer 和停用词

Question

我正在从 sklearn 导入 TfidfTransformer 并尝试使用 stop_word 参数，但它显示错误。

from sklearn.feature_extraction.text import TfidfTransformer
tfidf = TfidfTransformer(stop_words='english')


TypeError                                 Traceback (most recent call last)
<ipython-input-16-1315a209c082> in <module>
      1 from sklearn.feature_extraction.text import TfidfTransformer
----> 2 tfidf = TfidfTransformer(stop_words='english')

TypeError: __init__() got an unexpected keyword argument 'stop_words'

如何解决这个错误？

Answer 1

我认为您打算使用 TfidfVectorizer，它具有参数 stop_words。请参阅文档 here

示例：

from sklearn.feature_extraction.text import TfidfVectorizer
corpus = [
    'This is the first document.',
    'This document is the second document.',
    'And this is the third one.',
    'Is this the first document?',
]
vectorizer = TfidfVectorizer(stop_words='english')
X = vectorizer.fit_transform(corpus)

TfidfTransformer 和停用词

TfidfTransformer and stop words

python

machine-learning

scikit-learn

sklearn-pandas