"Unwrapping" SklearnClassifier 对象 - NLTK Python

Question

我使用了 NLTK python 包中的 SklearnClassifier() 包装器来训练几个 sklearn 分类器（LogisticRegression() 和 RandomForest()）来解决文本是特征的二元分类问题。是否有任何功能允许 "unwrap" 这个对象，以便可以访问参数估计（用于逻辑回归）或随机森林中的变量重要性列表（或原始 sklearn 中可用的任何项目）等内容目的）？ nltk 分类器对象可以对新实例进行评分，因此底层信息必须包含在该对象的某个地方？谢谢你的想法。

Answer 1

您的分类器隐藏在 _clf 变量下。

classifier = SKLearnClassifier(MLPClassifier())
mlp = classifier._clf

在 http://www.nltk.org/_modules/nltk/classify/scikitlearn.html 处找到的文档：

def __init__(self, estimator, dtype=float, sparse=True):
    """
    :param estimator: scikit-learn classifier object.

    :param dtype: data type used when building feature array.
        scikit-learn estimators work exclusively on numeric data. The
        default value should be fine for almost all situations.

    :param sparse: Whether to use sparse matrices internally.
        The estimator must support these; not all scikit-learn classifiers
        do (see their respective documentation and look for "sparse
        matrix"). The default value is True, since most NLP problems
        involve sparse feature sets. Setting this to False may take a
        great amount of memory.
    :type sparse: boolean.
    """
    self._clf = estimator
    self._encoder = LabelEncoder()
    self._vectorizer = DictVectorizer(dtype=dtype, sparse=sparse)

"Unwrapping" SklearnClassifier 对象 - NLTK Python

"Unwrapping" SklearnClassifier Object - NLTK Python

python

nltk

scikit-learn