映射数组以在 Matplotlib 图表上按降序排序?

Mapping an array to sort it in descending order on Matplotlib chart?

我正在尝试构建一个条形图,其中的条形图按降序显示。

在我的代码中,numpy 数组是使用 SelectKmeans() select 机器学习问题中的最佳特征的结果,具体取决于它们的方差。

import numpy as np
import matplotlib.pyplot as plt 

flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']

fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
  8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()

print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths'  as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228  23.95686725  10.71979245  13.38566487 219.41737141
  8.19261323  27.69341779  64.96469182 218.77495366  22.7037686 ]

所以我想在我的条形图上按降序显示这些值,int_rate 在最上面。

fimportance_sorted = np.sort(fimportance)  
fimportance_sorted

array([250.14120228, 219.41737141, 218.77495366,  64.96469182,
        27.69341779,  23.95686725,  22.7037686 ,  13.38566487,
        10.71979245,   8.19261323])

#  this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()

接下来我尝试了这个。

plt.barh([x for x in range(len(fimportance))], fimportance)

我知道我需要以某种方式将这些索引映射到 flist 值,然后对它们进行排序。也许通过创建一个数组然后映射我的列表标签而不是它的索引。我卡在这里了。

for i,v in enumerate(fimportance):
    arr = np.array([i,v])

.....

感谢您对这个问题的帮助。

the values and indices are messed up

那是因为你对 fimportance (fimportance_sorted = np.sort(fimportance)) 进行了排序,但是 flist 中标签的顺序保持不变,所以现在标签与 [=14] 中的值不对应=].

您可以使用 numpy.argsort 获取 索引 ,这会将 fimportance 放入排序顺序,然后索引 flistfimportance 具有这些指数:

>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
...   8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
       'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
       'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([  8.19261323,  10.71979245,  13.38566487,  22.7037686 ,
        23.95686725,  27.69341779,  64.96469182, 218.77495366,
       219.41737141, 250.14120228])

idxorder ,您需要将 fimportance 的元素放在其中进行排序。 flist 的顺序必须与 fimportance 的顺序匹配,所以索引都用 idx.

因此,np.array(flist)[idx] 的元素对应于 fimportance[idx] 的元素。