Overlapping/crowded y 轴上的标签 python

Question

我有点急于完成明天向项目所有者的演示。我们是德国的一小群经济专业学生，他们试图通过 python 了解机器学习。我们设置了一个随机森林分类器，并迫切希望在一个整洁的图中向估计器展示重要特征。通过应用 google 搜索，我们得出了以下解决方案，该解决方案可以解决问题，但由于 y 轴上的标签重叠，我们并不满意。我们使用的代码如下所示：

feature_importances = clf.best_estimator_.feature_importances_
feature_importances = 100 * (feature_importances / feature_importances.max())
sorted_idx = np.argsort(feature_importances)

pos = np.arange(sorted_idx.shape[0])
plt.barh(pos, feature_importances[sorted_idx], align='center', height=0.8)
plt.yticks(pos, df_year_four.columns[sorted_idx])
plt.show()

出于隐私考虑，让我这样说：y 轴上的特征名称重叠（大约有 30 个）。我正在查看 matplotlib 的文档，以便了解如何自己执行此操作，不幸的是我找不到任何有用的东西。似乎训练和测试模型比理解 matplotlib 和创建绘图更容易 :D

非常感谢你的帮助和抽出时间，我很感激。

Answer 1

您使用的 np.argsort 将 return 具有许多索引的 numpy 数组。并且您将该数组用作 Y 轴的标签，因此标签重叠。

我的建议是为 sorted_idx 使用索引，例如

plt.yticks(pos, df_year_four.columns[sorted_idx[0]])

这将仅为 1 个标签绘图。

Answer 2

明白了伙计们！ 'Geistesblitz' 正如我们在德国所说的那样！（精神闪电）看到第三行的变量 feature_importances 了吗？添加 feature_importnaces[:-15] 仅查看特征的上半部分并放松 y 轴。是的！！！这很好，因为有很多不太重要的功能。

Answer 3

我看到了你的解决方案，我想在这里添加这个 link 来解释原因：

The spacing between ticklabels is exclusively determined by the space between ticks on the axes. Therefore the only way to obtain more space between given ticklabels is to make the axes larger.

我 link 编辑的问题表明，通过使图表足够大，您的轴标签自然会间隔得更好。

Overlapping/crowded y 轴上的标签 python

Overlapping/crowded labels on y-axis python

scrum

machine-learning

matplotlib

python-3.x

random-forest