如何知道一个词的具体TF-IDF值?
How to know specific TF-IDF value of a word?
如何使用 TfidfVectorizer 函数知道特定单词的值?
比如我的代码是:
docs = []
docs.append("this is sentence number one")
docs.append("this is sentence number two")
vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
sklearn_representation = vectorizer.fit_transform(docs)
现在,我怎么知道句子2(docs[1])中"sentence"的TF-IDF值?
您需要使用 vectorizer
的 vocabulary_
属性,它是术语到特征索引的映射。
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> docs = []
>>> docs.append("this is sentence number one")
>>> docs.append("this is sentence number two")
>>> vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
>>> x = vectorizer.fit_transform(docs)
>>> x.todense()
matrix([[ 0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])
>>> vectorizer.vocabulary_['sentence']
1
>>> c = vectorizer.vocabulary_['sentence']
>>> x[:,c]
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> x[:,c].todense()
matrix([[ 0.70710678],
[ 0.70710678]])
如何使用 TfidfVectorizer 函数知道特定单词的值? 比如我的代码是:
docs = []
docs.append("this is sentence number one")
docs.append("this is sentence number two")
vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
sklearn_representation = vectorizer.fit_transform(docs)
现在,我怎么知道句子2(docs[1])中"sentence"的TF-IDF值?
您需要使用 vectorizer
的 vocabulary_
属性,它是术语到特征索引的映射。
>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> docs = []
>>> docs.append("this is sentence number one")
>>> docs.append("this is sentence number two")
>>> vectorizer = TfidfVectorizer(norm='l2',min_df=0, use_idf=True, smooth_idf=True, stop_words='english', sublinear_tf=True)
>>> x = vectorizer.fit_transform(docs)
>>> x.todense()
matrix([[ 0.70710678, 0.70710678],
[ 0.70710678, 0.70710678]])
>>> vectorizer.vocabulary_['sentence']
1
>>> c = vectorizer.vocabulary_['sentence']
>>> x[:,c]
<2x1 sparse matrix of type '<class 'numpy.float64'>'
with 2 stored elements in Compressed Sparse Row format>
>>> x[:,c].todense()
matrix([[ 0.70710678],
[ 0.70710678]])