来自 sklearn 的 Tfidfvectorizer - 如何获取矩阵
Tfidfvectorizer from sklearn - how to get matrix
我想从 sklearn 的 Tfidfvectorizer 对象中获取矩阵。这是我的代码:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
vectorizer = TfidfVectorizer()
vectorizer.fit_transform(text)
这是我尝试过但返回的错误:
vectorizer.toarray()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last) <ipython-input-117-76146e626284> in <module>()
----> 1 vectorizer.toarray()
AttributeError: 'TfidfVectorizer' object has no attribute 'toarray'
再次尝试
vectorizer.todense()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-118-6386ee121184> in <module>()
----> 1 vectorizer.todense()
AttributeError: 'TfidfVectorizer' object has no attribute 'todense'
.fit_transform
本身returns一个文档术语矩阵。所以,你这样做:
matrix = vectorizer.fit_transform(text)
matrix.todense()
用于将稀疏矩阵转换为密集矩阵。
matrix.shape
会给你矩阵的形状。
请注意,vectorizer.fit_transform
returns 您要获取的术语文档矩阵。所以保存它 returns,并使用 todense
,因为它将是稀疏格式:
Returns: X : sparse matrix, [n_samples, n_features].
Tf-idf-weighted document-term matrix.
a = vectorizer.fit_transform(text)
a.todense()
matrix([[0.36388646, 0.27674503, 0.27674503, 0.36388646, 0.36388646,
0.36388646, 0.36388646, 0.42983441],
[0. , 0.78980693, 0. , 0. , 0. ,
0. , 0. , 0.61335554],
[0. , 0. , 0.78980693, 0. , 0. ,
0. , 0. , 0.61335554]])
我想从 sklearn 的 Tfidfvectorizer 对象中获取矩阵。这是我的代码:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ["The quick brown fox jumped over the lazy dog.",
"The dog.",
"The fox"]
vectorizer = TfidfVectorizer()
vectorizer.fit_transform(text)
这是我尝试过但返回的错误:
vectorizer.toarray()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-117-76146e626284> in <module>() ----> 1 vectorizer.toarray() AttributeError: 'TfidfVectorizer' object has no attribute 'toarray'
再次尝试
vectorizer.todense()
--------------------------------------------------------------------------- AttributeError Traceback (most recent call last) <ipython-input-118-6386ee121184> in <module>() ----> 1 vectorizer.todense() AttributeError: 'TfidfVectorizer' object has no attribute 'todense'
.fit_transform
本身returns一个文档术语矩阵。所以,你这样做:
matrix = vectorizer.fit_transform(text)
matrix.todense()
用于将稀疏矩阵转换为密集矩阵。
matrix.shape
会给你矩阵的形状。
请注意,vectorizer.fit_transform
returns 您要获取的术语文档矩阵。所以保存它 returns,并使用 todense
,因为它将是稀疏格式:
Returns: X : sparse matrix, [n_samples, n_features]. Tf-idf-weighted document-term matrix.
a = vectorizer.fit_transform(text)
a.todense()
matrix([[0.36388646, 0.27674503, 0.27674503, 0.36388646, 0.36388646,
0.36388646, 0.36388646, 0.42983441],
[0. , 0.78980693, 0. , 0. , 0. ,
0. , 0. , 0.61335554],
[0. , 0. , 0.78980693, 0. , 0. ,
0. , 0. , 0.61335554]])