TfidfVectorizer 如何接受他的论点?

How does TfidfVectorizer take his arguments?

我想更好地了解 TfidfVectorizer 的工作原理。我不明白如何使用 get_feature_name

等后续功能

这是我的问题的可重现示例:

from sklearn.feature_extraction.text import TfidfVectorizer

text = ['It was a queer, sultry summer', 'the summer they electrocuted the Rosenbergs',
    'and I didn’t know what I was doing in New York', 'I m stupid about executions',
    'The idea of being electrocuted makes me sick',
    'and that’s all there was to read about in the papers',
    'goggle eyed headlines staring up at me on every street corner and at the fusty',
    'peanut-selling mouth of every subway', 'It had nothing to do with me',
    'but I couldn’t help wondering what it would be like',
    'being burned alive all along your nerves']


tfidf_vect = TfidfVectorizer(max_df=0.7,
                                 min_df= 0.01,
                                 use_idf=True,
                                 ngram_range=(1,2)) 

tfidf_mat = tfidf_vect.fit_transform(text)
print(tfidf_mat)
features = tfidf_vect.get_feature_names()
print(features)

在这个例子中,我认为我的对象 tfidf_vect 定义了我应用 TfidfVectorizer 所需的所有参数,然后我将其应用到 text,以获得结果对象 tfidf_mat.

我不明白为什么,为了提取我的 tfidf 分析的附加信息,我将函数应用于对象 tfidf_vect 而不是 tfidf_mat

命令 tfidf_vect.get_feature_names() 如何知道这将应用于 text,如果它没有在其定义中指定?

命令 tfidf_vect.get_feature_names() 有效,因为 tfidf_vect 是 class TfidfVectorizer 的一个实例。此 class 具有某些属性(参见 documentation). These attributes can change after calling methods of the class, such as the method fit_transform. Now, get_feature_names has access to the same attributes of the class instance as the fit_transform method. You might want to read more about classes、方法、属性等。

所以:tfidf_mat 简单地保存了 fit_transform() 的 return 值(它是 (n_samples, n_features) 的稀疏矩阵)。调用 fit_transform() 后,tfidf_vect 的属性会发生变化,可以通过该 class 实例的任何方法访问(get_feature_names())。