Pandas 排序未正确排序数据
Pandas sort not sorting data properly
我正在尝试对 sklearn.ensemble.RandomForestRegressor
的 feature_importances_
的结果进行排序
我有以下功能:
def get_feature_importances(cols, importances):
feats = {}
for feature, importance in zip(cols, importances):
feats[feature] = importance
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance')
return importances
我是这样用的:
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
我得到以下结果:
| PART | 0.035034 |
| MONTH1 | 0.02507 |
| YEAR1 | 0.020075 |
| MONTH2 | 0.02321 |
| YEAR2 | 0.017861 |
| MONTH3 | 0.042606 |
| YEAR3 | 0.028508 |
| DAYS | 0.047603 |
| MEDIANDIFF | 0.037696 |
| F2 | 0.008783 |
| F1 | 0.015764 |
| F6 | 0.017933 |
| F4 | 0.017511 |
| F5 | 0.017799 |
| SS22 | 0.010521 |
| SS21 | 0.003896 |
| SS19 | 0.003894 |
| SS23 | 0.005249 |
| SS20 | 0.005127 |
| RR | 0.021626 |
| HI_HOURS | 0.067584 |
| OI_HOURS | 0.054369 |
| MI_HOURS | 0.062121 |
| PERFORMANCE_FACTOR | 0.033572 |
| PERFORMANCE_INDEX | 0.073884 |
| NUMPA | 0.022445 |
| BUMPA | 0.024192 |
| ELOH | 0.04386 |
| FFX1 | 0.128367 |
| FFX2 | 0.083839 |
我认为 importances.sort_values(by='Gini-importance')
行可以对它们进行排序。但事实并非如此。为什么这不能正确执行?
importances.sort_values(by='Gini-importance')
returns 已排序的数据框,您的函数忽略了它。
你想要return importances.sort_values(by='Gini-importance')
.
或者您可以 sort_values
原地:
importances.sort_values(by='Gini-importance', inplace=True)
return importances
我正在尝试对 sklearn.ensemble.RandomForestRegressor
的 feature_importances_
我有以下功能:
def get_feature_importances(cols, importances):
feats = {}
for feature, importance in zip(cols, importances):
feats[feature] = importance
importances = pd.DataFrame.from_dict(feats, orient='index').rename(columns={0: 'Gini-importance'})
importances.sort_values(by='Gini-importance')
return importances
我是这样用的:
importances = get_feature_importances(X_test.columns, rf.feature_importances_)
print()
print(importances)
我得到以下结果:
| PART | 0.035034 |
| MONTH1 | 0.02507 |
| YEAR1 | 0.020075 |
| MONTH2 | 0.02321 |
| YEAR2 | 0.017861 |
| MONTH3 | 0.042606 |
| YEAR3 | 0.028508 |
| DAYS | 0.047603 |
| MEDIANDIFF | 0.037696 |
| F2 | 0.008783 |
| F1 | 0.015764 |
| F6 | 0.017933 |
| F4 | 0.017511 |
| F5 | 0.017799 |
| SS22 | 0.010521 |
| SS21 | 0.003896 |
| SS19 | 0.003894 |
| SS23 | 0.005249 |
| SS20 | 0.005127 |
| RR | 0.021626 |
| HI_HOURS | 0.067584 |
| OI_HOURS | 0.054369 |
| MI_HOURS | 0.062121 |
| PERFORMANCE_FACTOR | 0.033572 |
| PERFORMANCE_INDEX | 0.073884 |
| NUMPA | 0.022445 |
| BUMPA | 0.024192 |
| ELOH | 0.04386 |
| FFX1 | 0.128367 |
| FFX2 | 0.083839 |
我认为 importances.sort_values(by='Gini-importance')
行可以对它们进行排序。但事实并非如此。为什么这不能正确执行?
importances.sort_values(by='Gini-importance')
returns 已排序的数据框,您的函数忽略了它。
你想要return importances.sort_values(by='Gini-importance')
.
或者您可以 sort_values
原地:
importances.sort_values(by='Gini-importance', inplace=True)
return importances