scikit Lasso/LARS 是如何作为回归特征选择工具使用的？

Question

我有大约 22 个数据预测变量，x_i，我想减少到一定数量以便最好地描述 y。基本问题...但是，我很不清楚如何使用 scikit 和 linearmodel.lassoLars 来执行此任务。

从他们的示例文档中，代码类似于：

alpha = 0.1
lasso = Lasso(alpha=alpha)

y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)

所以它执行回归和套索，但我不确定如何使用 y_pred_lasso 来输出我想要的，即来自 22 个最能描述 [= 的原始预测变量的变量21=]。

Answer 1

您可以在 Lasso 实例上调用 fit 后使用 coef_ 属性访问所选功能。该属性存储每个特征的权重。

>>> lasso = Lasso(alpha=alpha).fit(X_train, y_train)
>>> lasso.coef_ != 0
array([ True,  True,  True, False, False,  True,  True,  True,  True,
        True,  True,  True,  True], dtype=bool)
>>> import numpy as np
>>> np.nonzero(lasso.coef_)
(array([ 0,  1,  2,  5,  6,  7,  8,  9, 10, 11, 12]),)

scikit Lasso/LARS 是如何作为回归特征选择工具使用的？

How is the scikit Lasso/LARS used as a regressive feature selection tool?

python

scikits

scikit-learn