映射数组以在 Matplotlib 图表上按降序排序?
Mapping an array to sort it in descending order on Matplotlib chart?
我正在尝试构建一个条形图,其中的条形图按降序显示。
在我的代码中,numpy 数组是使用 SelectKmeans() select 机器学习问题中的最佳特征的结果,具体取决于它们的方差。
import numpy as np
import matplotlib.pyplot as plt
flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()
print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths' as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228 23.95686725 10.71979245 13.38566487 219.41737141
8.19261323 27.69341779 64.96469182 218.77495366 22.7037686 ]
所以我想在我的条形图上按降序显示这些值,int_rate 在最上面。
fimportance_sorted = np.sort(fimportance)
fimportance_sorted
array([250.14120228, 219.41737141, 218.77495366, 64.96469182,
27.69341779, 23.95686725, 22.7037686 , 13.38566487,
10.71979245, 8.19261323])
# this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()
接下来我尝试了这个。
plt.barh([x for x in range(len(fimportance))], fimportance)
我知道我需要以某种方式将这些索引映射到 flist 值,然后对它们进行排序。也许通过创建一个数组然后映射我的列表标签而不是它的索引。我卡在这里了。
for i,v in enumerate(fimportance):
arr = np.array([i,v])
.....
感谢您对这个问题的帮助。
the values and indices are messed up
那是因为你对 fimportance
(fimportance_sorted = np.sort(fimportance)
) 进行了排序,但是 flist
中标签的顺序保持不变,所以现在标签与 [=14] 中的值不对应=].
您可以使用 numpy.argsort
获取 索引 ,这会将 fimportance
放入排序顺序,然后索引 flist
和 fimportance
具有这些指数:
>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
... 8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([ 8.19261323, 10.71979245, 13.38566487, 22.7037686 ,
23.95686725, 27.69341779, 64.96469182, 218.77495366,
219.41737141, 250.14120228])
idx
是 order ,您需要将 fimportance
的元素放在其中进行排序。 flist
的顺序必须与 fimportance
的顺序匹配,所以索引都用 idx
.
因此,np.array(flist)[idx]
的元素对应于 fimportance[idx]
的元素。
我正在尝试构建一个条形图,其中的条形图按降序显示。
在我的代码中,numpy 数组是使用 SelectKmeans() select 机器学习问题中的最佳特征的结果,具体取决于它们的方差。
import numpy as np
import matplotlib.pyplot as plt
flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ]) # this is the numpy.ndarray after running SelectKBest()
print(fimportance) # this gives me 'int_rate', 'fico', 'revol_util', 'inq_last_6mths' as 4 most #important features as their variance values are mapped to flist, e.g. 250 relates to'int_rate' and 218 relates to 'inq_last_6mths'.
[250.14120228 23.95686725 10.71979245 13.38566487 219.41737141
8.19261323 27.69341779 64.96469182 218.77495366 22.7037686 ]
所以我想在我的条形图上按降序显示这些值,int_rate 在最上面。
fimportance_sorted = np.sort(fimportance)
fimportance_sorted
array([250.14120228, 219.41737141, 218.77495366, 64.96469182,
27.69341779, 23.95686725, 22.7037686 , 13.38566487,
10.71979245, 8.19261323])
# this bar chart is not right because here the values and indices are messed up.
plt.barh(flist, fimportance_sorted)
plt.show()
接下来我尝试了这个。
plt.barh([x for x in range(len(fimportance))], fimportance)
我知道我需要以某种方式将这些索引映射到 flist 值,然后对它们进行排序。也许通过创建一个数组然后映射我的列表标签而不是它的索引。我卡在这里了。
for i,v in enumerate(fimportance):
arr = np.array([i,v])
.....
感谢您对这个问题的帮助。
the values and indices are messed up
那是因为你对 fimportance
(fimportance_sorted = np.sort(fimportance)
) 进行了排序,但是 flist
中标签的顺序保持不变,所以现在标签与 [=14] 中的值不对应=].
您可以使用 numpy.argsort
获取 索引 ,这会将 fimportance
放入排序顺序,然后索引 flist
和 fimportance
具有这些指数:
>>> import numpy as np
>>> flist = ['int_rate', 'installment', 'log_annual_inc','dti', 'fico', 'days_with_cr_line', 'revol_bal', 'revol_util', 'inq_last_6mths','pub_rec']
>>> fimportance = np.array([250.14120228,23.95686725,10.71979245,13.38566487,219.41737141,
... 8.19261323,27.69341779,64.96469182,218.77495366,22.7037686 ])
>>> idx = np.argsort(fimportance)
>>> idx
array([5, 2, 3, 9, 1, 6, 7, 8, 4, 0])
>>> flist[idx]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: only integer scalar arrays can be converted to a scalar index
>>> np.array(flist)[idx]
array(['days_with_cr_line', 'log_annual_inc', 'dti', 'pub_rec',
'installment', 'revol_bal', 'revol_util', 'inq_last_6mths', 'fico',
'int_rate'], dtype='<U17')
>>> fimportance[idx]
array([ 8.19261323, 10.71979245, 13.38566487, 22.7037686 ,
23.95686725, 27.69341779, 64.96469182, 218.77495366,
219.41737141, 250.14120228])
idx
是 order ,您需要将 fimportance
的元素放在其中进行排序。 flist
的顺序必须与 fimportance
的顺序匹配,所以索引都用 idx
.
因此,np.array(flist)[idx]
的元素对应于 fimportance[idx]
的元素。