如何在scikit中获取与卡方特征选择分数对应的特征名称
How to get feature names corresponding to scores for chi square feature selection in scikit
我正在使用 Scikit 进行特征选择,但我想获取文本中所有 unigrams 的分数值。我得到了分数,但我如何将这些分数映射到实际的特征名称。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2
Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_
这给出了结果,但我如何知道哪些特征名称映射到哪些分数?
它就在文档中:
我正在使用 Scikit 进行特征选择,但我想获取文本中所有 unigrams 的分数值。我得到了分数,但我如何将这些分数映射到实际的特征名称。
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.feature_selection import SelectKBest, chi2
Texts=["should schools have uniform","schools discipline","legalize marriage","marriage culture"]
labels=["3","3","7","7"]
vectorizer = CountVectorizer()
term_doc=vectorizer.fit_transform(Texts)
ch2 = SelectKBest(chi2, "all")
X_train = ch2.fit_transform(term_doc, labels)
print ch2.scores_
这给出了结果,但我如何知道哪些特征名称映射到哪些分数?
它就在文档中: