SelectFromModel() 如何从 from_model.py 工作?

How does SelectFromModel() work from from_model.py?

fsel = ske.ExtraTreesClassifier().fit(X, y)

model = SelectFromModel(fsel, prefit=True)

我正在尝试在 ExtraTreesClassifier 上训练数据集 SelectFromModel() 函数如何决定重要性值以及它的作用是什么它 return?

SelectFromModel 的文档中所述:

threshold : string, float, optional default None

The threshold value to use for feature selection. Features whose importance is greater or equal are kept while the others are discarded. If “median” (resp. “mean”), then the threshold value is the median (resp. the mean) of the feature importances. A scaling factor (e.g., “1.25*mean”) may also be used. If None and if the estimator has a parameter penalty set to l1, either explicitly or implicitly (e.g, Lasso), the threshold used is 1e-5. Otherwise, “mean” is used by default.

在您的情况下,threshold 是默认值,None,ExtraTreesClassifier 中 feature_importances_ 的平均值将用作阈值。

例子

from sklearn.datasets import load_iris
from sklearn.ensemble import ExtraTreesClassifier
from sklearn.feature_selection import SelectFromModel

iris = load_iris()
X, y  = iris.data, iris.target
clf = ExtraTreesClassifier()
model = SelectFromModel(clf)
SelectFromModel(estimator=ExtraTreesClassifier(bootstrap=False, class_weight=None, criterion='gini',
           max_depth=None, max_features='auto', max_leaf_nodes=None,
           min_impurity_decrease=0.0, min_impurity_split=None,
           min_samples_leaf=1, min_samples_split=2,
           min_weight_fraction_leaf=0.0, n_estimators=10, n_jobs=1,
           oob_score=False, random_state=None, verbose=0, warm_start=False),
        norm_order=1, prefit=False, threshold=None)
model.fit(X, y)
model.threshold_
0.25
model.estimator_.feature_importances_
array([0.09790258, 0.02597852, 0.35586554, 0.52025336])
model.estimator_.feature_importances_.mean()
0.25

如您所见,拟合模型是 SelectFromModel 的一个实例,ExtraTreesClassifier 作为估计器。阈值为0.25,这也是拟合估计量的特征重要性的平均值。基于特征重要性和阈值,模型将仅保留输入数据的第 3 和第 4 个特征(那些重要性大于阈值的特征)。您可以使用拟合 SelectFromModel class 的 transform 方法从输入数据中 select 这些特征。