如何从特征列表和权重系数列表中 select 前 10 个特征(逻辑回归)?
How to select top 10 features from a list of features and a list of weight coefficients ( Logistic Regression )?
我想 select 我的 Logistic 回归模型中的前 5 个特征。我现在有两个数组,一个包含所有特征名称,另一个列表包含来自 model.coef_ 的系数,其中模型 = LogisticRegression()。
feature_list = ['ball', 'cat', 'apple',....,] # this has 108 elements
coefficents = lr.coef_
print(coefficents[0])
打印如下:
[ 2.07587361e-04 5.59531750e-04 0.00000000e+00 0.00000000e+00
-5.16353886e-02 ...... 1.66633057e-02] #this also has 108 elements
当我尝试对 coeff 值进行排序时,我得到了不同的值。
sorted_index = np.argsort(coefficents[0])
print(sorted_index)
[ 22 91 42 15 52 31 16 32 86 .... 17 106] #this has 108 values
如何从这两个数组中获取正确的前 5 个重要特征?
argsort
是升序排列,你要降序(从大到小)
这里我举个简单的例子:
import numpy as np
feature_list = ['ball', 'cat', 'apple', 'house', 'tree', 'school', 'child']
coeff = np.array([0.7, 0.3, 0.8, 0.2, 0.4, 0.1, 0.9])
# negate the coeff. to sort them in descending order
idx = (-coeff).argsort()
# map index to feature list
desc_feature = [feature_list[i] for i in idx]
# select the top 5 feature
top_feature = desc_feature [:5]
print(top_feature)
结果是您的主要特征:
['child', 'apple', 'ball', 'tree', 'cat']
我想 select 我的 Logistic 回归模型中的前 5 个特征。我现在有两个数组,一个包含所有特征名称,另一个列表包含来自 model.coef_ 的系数,其中模型 = LogisticRegression()。
feature_list = ['ball', 'cat', 'apple',....,] # this has 108 elements
coefficents = lr.coef_
print(coefficents[0])
打印如下:
[ 2.07587361e-04 5.59531750e-04 0.00000000e+00 0.00000000e+00
-5.16353886e-02 ...... 1.66633057e-02] #this also has 108 elements
当我尝试对 coeff 值进行排序时,我得到了不同的值。
sorted_index = np.argsort(coefficents[0])
print(sorted_index)
[ 22 91 42 15 52 31 16 32 86 .... 17 106] #this has 108 values
如何从这两个数组中获取正确的前 5 个重要特征?
argsort
是升序排列,你要降序(从大到小)
这里我举个简单的例子:
import numpy as np
feature_list = ['ball', 'cat', 'apple', 'house', 'tree', 'school', 'child']
coeff = np.array([0.7, 0.3, 0.8, 0.2, 0.4, 0.1, 0.9])
# negate the coeff. to sort them in descending order
idx = (-coeff).argsort()
# map index to feature list
desc_feature = [feature_list[i] for i in idx]
# select the top 5 feature
top_feature = desc_feature [:5]
print(top_feature)
结果是您的主要特征:
['child', 'apple', 'ball', 'tree', 'cat']