如何在数字行而不是类别上做 Pandas 堆叠条形图

How to do Pandas stacked bar chart on number line instead of categories

我正在尝试制作一个堆叠条形图,其中 x 轴基于常规数字线而不是类别。也许条形图不是正确的术语?

如何制作堆叠条形图,同时让 x 数字线“正常”间隔(在 5.0 和 10.6 之间有很大的相对差距)?我还想设置一个固定的刻度间隔,而不是标记每个条形。 (真实的数据集是密集的,但有一些虚假的差距,我想使用条形颜色来定性地显示变化作为 x 的函数。)

fid = ["name", "name", "name", "name", "name"]
x = [1.02, 1.3, 2, 5, 10.6]
y1 = [0, 1, 0.2, 0.6, 0.1]
y2 = [0.3, 0, 0.1, 0.1, 0.4]
y3 = [0.7, 0, 0.7, 0.3, 0.5]
df = pd.DataFrame(data=zip(fid, x, y1, y2, y3), columns=["fid", "x", "y1", "y2", "y3"])

fig, ax = plt.subplots()
df.plot.bar(x="x", stacked=True, ax=ax)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)

在 matplotlib 条形图中,x 值被视为分类数据,因此 matplotlib 始终沿 range(0, ...) 绘制它并使用 x 值重新标记刻度。

要缩放条形距离,reindex x 值以在实际数据点之间填充行:

start, stop = 0, 16
xstep = 0.01
tickstep = 2

xfill = np.round(np.arange(start, stop + xstep, xstep), 2)
out = df.set_index("x").reindex(xfill).reset_index()

ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
xticklabels = np.arange(start, stop+tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)


详情

  1. xfill 生成为 [0, 0.01, 0.02, ...]。我试图通过从 x 中提取最大小数位数来使其可移植,但浮点精度总是很棘手,因此可能需要对其进行调整:

    decimals = df.x.astype(str).str.split(".").str[-1].str.len().max()
    xstep = 10.0 ** -decimals
    start = 0
    stop = 16
    
    xfill = np.round(np.arange(start, stop + xstep, xstep), decimals)
    # array([ 0.  ,  0.01,  0.02,  0.03,  0.04,  0.05,  ...])
    
  2. reindex x 列对应这个新的 xfill,因此填充行将为 NaN:

    out = df.set_index("x").reindex(xfill).reset_index()
    #     x   fid   y1   y2   y3
    #  0.00   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    #  1.01   NaN  NaN  NaN  NaN
    #  1.02  name  0.0  0.3  0.7
    #  1.03   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    #  1.29   NaN  NaN  NaN  NaN
    #  1.30  name  1.0  0.0  0.0
    #  1.31   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    #  1.99   NaN  NaN  NaN  NaN
    #  2.00  name  0.2  0.1  0.7
    #  2.01   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    #  4.99   NaN  NaN  NaN  NaN
    #  5.00  name  0.6  0.1  0.3
    #  5.01   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    # 10.59   NaN  NaN  NaN  NaN
    # 10.60  name  0.1  0.4  0.5
    # 10.61   NaN  NaN  NaN  NaN
    #   ...   ...  ...  ...  ...
    # 16.00   NaN  NaN  NaN  NaN
    
  3. 绘制重新索引的数据(xticks 间隔 tickstep):

    ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
    
    tickstep = 2
    xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float)
    xticks = out.index[out.x.isin(xticklabels)]
    ax.set_xticks(xticks)
    ax.set_xticklabels(xticklabels)
    
    plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
    

组合代码:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.DataFrame({"fid": ["name", "name", "name", "name", "name"], "x": [1.02, 1.3, 2, 5, 10.6], "y1": [0, 1, 0.2, 0.6, 0.1], "y2": [0.3, 0, 0.1, 0.1, 0.4], "y3": [0.7, 0, 0.7, 0.3, 0.5]})

decimals = df.x.astype(str).str.split(".").str[-1].str.len().max()
xstep = 10.0 ** -decimals
start = 0
stop = 16

xfill = np.round(np.arange(start, stop + xstep, xstep), decimals)
out = df.set_index("x").reindex(xfill).reset_index()
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))

tickstep = 2
xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)

plt.legend(bbox_to_anchor=(1.05, 1), loc=2)

我在下面的回答说明了如何进行间距堆叠。您可以采用该解决方案并根据您的需要定制功能,例如,您不需要使用 itertools,只需使用常规计数器即可。您还可以根据需要定制参数。

解决方案背后的想法:

  1. 使用cumsum计算堆叠
  2. 使用 matplotlib 绘制条形图(而不是堆栈)并使用 zorder 控制哪个在前面。

函数

from itertools import count
from math import floor

def plt_stack_spacing( df , figsize=(10,6) , width=0.2 , bb_anchor=(1.05,1)):

    ycol = df.columns[2:]
    df1 = df.iloc[:,[0,1]]
    df1 = df1.join(df[ycol].cumsum(axis=1))
    
    c = count(0,-1)  # either itertools.count or manuall adjust the number c+=1
    
    plt.figure(figsize=figsize)
    for col in ycol:
        plt.bar(df1.x,df1[col],label=col,width=width,zorder=next(c))

    xmin = floor(df.x.min())
    xmax = floor(df.x.max())
    xt = [*range(xmin,xmax+2)]
    
    plt.xticks(xt)
    plt.legend(bbox_to_anchor=bb_anchor, loc=2)
    plt.show();

调用函数:

plt_stack_spacing(df,(18,5),0.2,(1.01,1))

输出:

基准:300 行和 4 列 (y1,y2,y3,y4) 的 100 个图的时间 = 225 秒 = 3.75 分钟 没有 增强。