Matplotlib 条形图仅在非零条形图处显示 x 刻度

Matplotlib bar chart show x-ticks only at non-zero bars

我必须制作一个(堆叠的)条形图,它在 x 轴上有大约 3000 个位置。然而,其中许多位置不包含条形,但仍标记在 x 轴上,使阅读图变得困难。有没有办法只显示现有(堆叠)条的 x 刻度?基于 x 刻度值的条形之间的空间是必要的。如何在 matplotlib 中解决这个问题?有没有比堆叠条形图更合适的图?我正在从 pandas cross-table (pd.crosstab()).

构建地块

link 到剧情图片: https://i.stack.imgur.com/qk99z.png

作为我的数据框的示例(感谢 gepcel):

import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0
df_select = df[df.sum(axis=1)>0]

基本上,在没有示例的情况下,您应该 select 得出总值(也称为堆叠值)大于零的刻度。然后手动设置 xticks 和 xticklabels。

假设您有如下数据框:

import pandas as pd
import numpy as np
N = 3200
df = pd.DataFrame(np.random.randint(1, 5, size=(N, 3)))
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0

那么 selected 数据应该是这样的:

df_select = df[df.sum(axis=1)>0]

然后您可以绘制堆积条形图,例如:

# set width=20, the bar is not too thin to show
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
        bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
        bottom=df_select[0]+df_select[1])
# Only show the selected ticks, it'll be a little tricky if
# you want ticklabels to be different than ticks
# And still hard to avoid ticklabels overlapping
plt.xticks(df_select.index)
plt.legend()
plt.show()

结果应该是这样的:

更新:

通过以下方式很容易将文本放在栏的顶部:

for n, row in df_select.iterrows():
    plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')

就是计算出每条柱子顶部的位置,然后在上面放上文字,可能还会加上一些偏移量(比如+0.2),用rotation=90来控制旋转。完整代码为:

df_select = df[df.sum(axis=1)>0]
plt.bar(df_select.index, df_select[0], width=20, label='0')
plt.bar(df_select.index, df_select[1], width=20, label='1',
        bottom=df_select[0])
plt.bar(df_select.index, df_select[2], width=20, label='2',
        bottom=df_select[0]+df_select[1])

# Here is the part to put text:
for n, row in df_select.iterrows():
    plt.text(n, row.sum()+0.2, n, ha='center', rotation=90, va='bottom')

plt.xticks(df_select.index)
plt.legend()
plt.show()

结果:

这是 gepcel 的答案,它适应具有不同列数的数据框:

# in this case I'm creating the dataframe with 3 columns
# but the code is meant to adapt to dataframes with varying column numbers
df = pd.DataFrame(np.random.randint(1, 5, size=(3200, 3)))    
df.loc[np.random.choice(df.index, size=3190, replace=False), :] = 0

df_select = df[df.sum(axis=1)>1]
fig, ax = plt.subplots()

ax.bar(df_select.index, df_select.iloc[:,0], label = df_select.columns[0])

if df_select.shape[1] > 1:
    for i in range(1, df_select.shape[1]):
        bottom = df_select.iloc[:,np.arange(0,i,1)].sum(axis=1)
        ax.bar(df_select.index, df_select.iloc[:,i], bottom=bottom, label = 
df_select.columns[i])

ax.set_xticks(df_select.index)
plt.legend(loc='best', bbox_to_anchor=(1, 0.5))
plt.xticks(rotation=90, fontsize=8)