sklearn Selectkbest,如何创建 {feature1:score,feature2:score...} 的字典
sklearn Selectkbest, how to create a dict of {feature1:score,feature2:score...}
我想更清楚地了解 selectkbest 过程。
我希望在字典上看到所有特征(选择或未选择)的分数,以便以后像这样绘制它:
enter image description here
到目前为止我已经试过了
print selector.scores_
我在哪里收到
[ 18.57570327 9.34670079 10.07245453 24.46765405 6.23420114
4.20497086 8.86672154 0.21705893 11.59554766 25.09754153
7.2427304 21.06000171 5.31257143 0.1641645 1.69882435]
或
print sorted(selector.scores_, reverse=True)[:5]
或
selector = SelectKBest(f_classif, k=5)
selectedFeatures = selector.fit(features, labels)
selected_features_list = [features_list[i+1] for i in selectedFeatures.get_support(indices=True)]
features_list = features_list[:1]+selected_features_list
print 'New feature_list after SelectKbest is\n',features_list,'\n'
print sorted(selector.scores_, reverse=True)[:5]
where I can know the features selected,我可以知道5个最好的特征,但不能确定索引是否相同。
New feature_list after SelectKbest is
['poi', 'salary', 'total_stock_value', 'deferred_income', 'exercised_stock_options', 'bonus']
[25.097541528735491, 24.467654047526398, 21.060001707536571, 18.575703268041785, 11.595547659730601]
我要找的是:
[[best_feature,best_score],
[2nbest_feature,2nbest_score],
[3rdbest_feature,3rdbest_score],
and so on with all features]
有什么想法吗?
提醒一句,字典是一个无序对象,所以这样做没有意义,但我已经为你包括了最后一步
首先你把你的分数和名字组合成一个对象:
combined = zip(feature_names, scores)
然后你需要根据分数对你的对象进行排序:
combined.sort(reverse=True, key= lambda x: x[1])
然后只需将您的数据放入字典即可:
dict((x, y) for x, y in combined)
回答我自己的问题
对于字典创建:
all_scores_dict = {}
for i, score in enumerate(selector.scores_):
all_scores_dict[features_list[support[i]+1]] = score
用于排序(表示现在是元组列表)
import operator
sorted_dict_scores = sorted(all_scores_dict.items(), key=operator.itemgetter(1),reverse = True)
这给了你
[('exercised_stock_options', 25.097541528735491),
('total_stock_value', 24.467654047526398),
('bonus', 21.060001707536571),
('salary', 18.575703268041785),
('deferred_income', 11.595547659730601),
('long_term_incentive', 10.072454529369441),
('restricted_stock', 9.3467007910514877),
('total_payments', 8.8667215371077717),
('loan_advances', 7.2427303965360181),
('expenses', 6.2342011405067401),
('sum_of_unclassified', 5.31257142710212),
('other', 4.204970858301416),
('to_messages', 1.6988243485808501),
('deferral_payments', 0.2170589303395084),
('from_messages', 0.16416449823428736)]
我想更清楚地了解 selectkbest 过程。 我希望在字典上看到所有特征(选择或未选择)的分数,以便以后像这样绘制它:
enter image description here
到目前为止我已经试过了
print selector.scores_
我在哪里收到
[ 18.57570327 9.34670079 10.07245453 24.46765405 6.23420114
4.20497086 8.86672154 0.21705893 11.59554766 25.09754153
7.2427304 21.06000171 5.31257143 0.1641645 1.69882435]
或
print sorted(selector.scores_, reverse=True)[:5]
或
selector = SelectKBest(f_classif, k=5)
selectedFeatures = selector.fit(features, labels)
selected_features_list = [features_list[i+1] for i in selectedFeatures.get_support(indices=True)]
features_list = features_list[:1]+selected_features_list
print 'New feature_list after SelectKbest is\n',features_list,'\n'
print sorted(selector.scores_, reverse=True)[:5]
where I can know the features selected,我可以知道5个最好的特征,但不能确定索引是否相同。
New feature_list after SelectKbest is
['poi', 'salary', 'total_stock_value', 'deferred_income', 'exercised_stock_options', 'bonus']
[25.097541528735491, 24.467654047526398, 21.060001707536571, 18.575703268041785, 11.595547659730601]
我要找的是:
[[best_feature,best_score],
[2nbest_feature,2nbest_score],
[3rdbest_feature,3rdbest_score],
and so on with all features]
有什么想法吗?
提醒一句,字典是一个无序对象,所以这样做没有意义,但我已经为你包括了最后一步
首先你把你的分数和名字组合成一个对象:
combined = zip(feature_names, scores)
然后你需要根据分数对你的对象进行排序:
combined.sort(reverse=True, key= lambda x: x[1])
然后只需将您的数据放入字典即可:
dict((x, y) for x, y in combined)
回答我自己的问题
对于字典创建:
all_scores_dict = {}
for i, score in enumerate(selector.scores_):
all_scores_dict[features_list[support[i]+1]] = score
用于排序(表示现在是元组列表)
import operator
sorted_dict_scores = sorted(all_scores_dict.items(), key=operator.itemgetter(1),reverse = True)
这给了你
[('exercised_stock_options', 25.097541528735491),
('total_stock_value', 24.467654047526398),
('bonus', 21.060001707536571),
('salary', 18.575703268041785),
('deferred_income', 11.595547659730601),
('long_term_incentive', 10.072454529369441),
('restricted_stock', 9.3467007910514877),
('total_payments', 8.8667215371077717),
('loan_advances', 7.2427303965360181),
('expenses', 6.2342011405067401),
('sum_of_unclassified', 5.31257142710212),
('other', 4.204970858301416),
('to_messages', 1.6988243485808501),
('deferral_payments', 0.2170589303395084),
('from_messages', 0.16416449823428736)]