TfidfVectorizer 如何接受他的论点?
How does TfidfVectorizer take his arguments?
我想更好地了解 TfidfVectorizer 的工作原理。我不明白如何使用 get_feature_name
等后续功能
这是我的问题的可重现示例:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ['It was a queer, sultry summer', 'the summer they electrocuted the Rosenbergs',
'and I didn’t know what I was doing in New York', 'I m stupid about executions',
'The idea of being electrocuted makes me sick',
'and that’s all there was to read about in the papers',
'goggle eyed headlines staring up at me on every street corner and at the fusty',
'peanut-selling mouth of every subway', 'It had nothing to do with me',
'but I couldn’t help wondering what it would be like',
'being burned alive all along your nerves']
tfidf_vect = TfidfVectorizer(max_df=0.7,
min_df= 0.01,
use_idf=True,
ngram_range=(1,2))
tfidf_mat = tfidf_vect.fit_transform(text)
print(tfidf_mat)
features = tfidf_vect.get_feature_names()
print(features)
在这个例子中,我认为我的对象 tfidf_vect
定义了我应用 TfidfVectorizer
所需的所有参数,然后我将其应用到 text
,以获得结果对象 tfidf_mat
.
我不明白为什么,为了提取我的 tfidf 分析的附加信息,我将函数应用于对象 tfidf_vect
而不是 tfidf_mat
。
命令 tfidf_vect.get_feature_names()
如何知道这将应用于 text
,如果它没有在其定义中指定?
命令 tfidf_vect.get_feature_names()
有效,因为 tfidf_vect
是 class TfidfVectorizer
的一个实例。此 class 具有某些属性(参见 documentation). These attributes can change after calling methods of the class, such as the method fit_transform
. Now, get_feature_names
has access to the same attributes of the class instance as the fit_transform
method. You might want to read more about classes、方法、属性等。
所以:tfidf_mat
简单地保存了 fit_transform()
的 return 值(它是 (n_samples, n_features) 的稀疏矩阵)。调用 fit_transform()
后,tfidf_vect
的属性会发生变化,可以通过该 class 实例的任何方法访问(get_feature_names()
)。
我想更好地了解 TfidfVectorizer 的工作原理。我不明白如何使用 get_feature_name
这是我的问题的可重现示例:
from sklearn.feature_extraction.text import TfidfVectorizer
text = ['It was a queer, sultry summer', 'the summer they electrocuted the Rosenbergs',
'and I didn’t know what I was doing in New York', 'I m stupid about executions',
'The idea of being electrocuted makes me sick',
'and that’s all there was to read about in the papers',
'goggle eyed headlines staring up at me on every street corner and at the fusty',
'peanut-selling mouth of every subway', 'It had nothing to do with me',
'but I couldn’t help wondering what it would be like',
'being burned alive all along your nerves']
tfidf_vect = TfidfVectorizer(max_df=0.7,
min_df= 0.01,
use_idf=True,
ngram_range=(1,2))
tfidf_mat = tfidf_vect.fit_transform(text)
print(tfidf_mat)
features = tfidf_vect.get_feature_names()
print(features)
在这个例子中,我认为我的对象 tfidf_vect
定义了我应用 TfidfVectorizer
所需的所有参数,然后我将其应用到 text
,以获得结果对象 tfidf_mat
.
我不明白为什么,为了提取我的 tfidf 分析的附加信息,我将函数应用于对象 tfidf_vect
而不是 tfidf_mat
。
命令 tfidf_vect.get_feature_names()
如何知道这将应用于 text
,如果它没有在其定义中指定?
命令 tfidf_vect.get_feature_names()
有效,因为 tfidf_vect
是 class TfidfVectorizer
的一个实例。此 class 具有某些属性(参见 documentation). These attributes can change after calling methods of the class, such as the method fit_transform
. Now, get_feature_names
has access to the same attributes of the class instance as the fit_transform
method. You might want to read more about classes、方法、属性等。
所以:tfidf_mat
简单地保存了 fit_transform()
的 return 值(它是 (n_samples, n_features) 的稀疏矩阵)。调用 fit_transform()
后,tfidf_vect
的属性会发生变化,可以通过该 class 实例的任何方法访问(get_feature_names()
)。