如何在数字行而不是类别上做 Pandas 堆叠条形图
How to do Pandas stacked bar chart on number line instead of categories
我正在尝试制作一个堆叠条形图,其中 x 轴基于常规数字线而不是类别。也许条形图不是正确的术语?
如何制作堆叠条形图,同时让 x 数字线“正常”间隔(在 5.0 和 10.6 之间有很大的相对差距)?我还想设置一个固定的刻度间隔,而不是标记每个条形。 (真实的数据集是密集的,但有一些虚假的差距,我想使用条形颜色来定性地显示变化作为 x 的函数。)
fid = ["name", "name", "name", "name", "name"]
x = [1.02, 1.3, 2, 5, 10.6]
y1 = [0, 1, 0.2, 0.6, 0.1]
y2 = [0.3, 0, 0.1, 0.1, 0.4]
y3 = [0.7, 0, 0.7, 0.3, 0.5]
df = pd.DataFrame(data=zip(fid, x, y1, y2, y3), columns=["fid", "x", "y1", "y2", "y3"])
fig, ax = plt.subplots()
df.plot.bar(x="x", stacked=True, ax=ax)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
在 matplotlib 条形图中,x
值被视为分类数据,因此 matplotlib 始终沿 range(0, ...)
绘制它并使用 x
值重新标记刻度。
要缩放条形距离,reindex
x
值以在实际数据点之间填充行:
start, stop = 0, 16
xstep = 0.01
tickstep = 2
xfill = np.round(np.arange(start, stop + xstep, xstep), 2)
out = df.set_index("x").reindex(xfill).reset_index()
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
xticklabels = np.arange(start, stop+tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
详情
将 xfill
生成为 [0, 0.01, 0.02, ...]
。我试图通过从 x
中提取最大小数位数来使其可移植,但浮点精度总是很棘手,因此可能需要对其进行调整:
decimals = df.x.astype(str).str.split(".").str[-1].str.len().max()
xstep = 10.0 ** -decimals
start = 0
stop = 16
xfill = np.round(np.arange(start, stop + xstep, xstep), decimals)
# array([ 0. , 0.01, 0.02, 0.03, 0.04, 0.05, ...])
reindex
x
列对应这个新的 xfill
,因此填充行将为 NaN:
out = df.set_index("x").reindex(xfill).reset_index()
# x fid y1 y2 y3
# 0.00 NaN NaN NaN NaN
# ... ... ... ... ...
# 1.01 NaN NaN NaN NaN
# 1.02 name 0.0 0.3 0.7
# 1.03 NaN NaN NaN NaN
# ... ... ... ... ...
# 1.29 NaN NaN NaN NaN
# 1.30 name 1.0 0.0 0.0
# 1.31 NaN NaN NaN NaN
# ... ... ... ... ...
# 1.99 NaN NaN NaN NaN
# 2.00 name 0.2 0.1 0.7
# 2.01 NaN NaN NaN NaN
# ... ... ... ... ...
# 4.99 NaN NaN NaN NaN
# 5.00 name 0.6 0.1 0.3
# 5.01 NaN NaN NaN NaN
# ... ... ... ... ...
# 10.59 NaN NaN NaN NaN
# 10.60 name 0.1 0.4 0.5
# 10.61 NaN NaN NaN NaN
# ... ... ... ... ...
# 16.00 NaN NaN NaN NaN
绘制重新索引的数据(xticks
间隔 tickstep
):
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
tickstep = 2
xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
组合代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"fid": ["name", "name", "name", "name", "name"], "x": [1.02, 1.3, 2, 5, 10.6], "y1": [0, 1, 0.2, 0.6, 0.1], "y2": [0.3, 0, 0.1, 0.1, 0.4], "y3": [0.7, 0, 0.7, 0.3, 0.5]})
decimals = df.x.astype(str).str.split(".").str[-1].str.len().max()
xstep = 10.0 ** -decimals
start = 0
stop = 16
xfill = np.round(np.arange(start, stop + xstep, xstep), decimals)
out = df.set_index("x").reindex(xfill).reset_index()
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
tickstep = 2
xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
我在下面的回答说明了如何进行间距堆叠。您可以采用该解决方案并根据您的需要定制功能,例如,您不需要使用 itertools,只需使用常规计数器即可。您还可以根据需要定制参数。
解决方案背后的想法:
- 使用cumsum计算堆叠
- 使用 matplotlib 绘制条形图(而不是堆栈)并使用 zorder 控制哪个在前面。
函数
from itertools import count
from math import floor
def plt_stack_spacing( df , figsize=(10,6) , width=0.2 , bb_anchor=(1.05,1)):
ycol = df.columns[2:]
df1 = df.iloc[:,[0,1]]
df1 = df1.join(df[ycol].cumsum(axis=1))
c = count(0,-1) # either itertools.count or manuall adjust the number c+=1
plt.figure(figsize=figsize)
for col in ycol:
plt.bar(df1.x,df1[col],label=col,width=width,zorder=next(c))
xmin = floor(df.x.min())
xmax = floor(df.x.max())
xt = [*range(xmin,xmax+2)]
plt.xticks(xt)
plt.legend(bbox_to_anchor=bb_anchor, loc=2)
plt.show();
调用函数:
plt_stack_spacing(df,(18,5),0.2,(1.01,1))
输出:
基准:300 行和 4 列 (y1,y2,y3,y4) 的 100 个图的时间 = 225 秒 = 3.75 分钟 没有 增强。
我正在尝试制作一个堆叠条形图,其中 x 轴基于常规数字线而不是类别。也许条形图不是正确的术语?
如何制作堆叠条形图,同时让 x 数字线“正常”间隔(在 5.0 和 10.6 之间有很大的相对差距)?我还想设置一个固定的刻度间隔,而不是标记每个条形。 (真实的数据集是密集的,但有一些虚假的差距,我想使用条形颜色来定性地显示变化作为 x 的函数。)
fid = ["name", "name", "name", "name", "name"]
x = [1.02, 1.3, 2, 5, 10.6]
y1 = [0, 1, 0.2, 0.6, 0.1]
y2 = [0.3, 0, 0.1, 0.1, 0.4]
y3 = [0.7, 0, 0.7, 0.3, 0.5]
df = pd.DataFrame(data=zip(fid, x, y1, y2, y3), columns=["fid", "x", "y1", "y2", "y3"])
fig, ax = plt.subplots()
df.plot.bar(x="x", stacked=True, ax=ax)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
在 matplotlib 条形图中,x
值被视为分类数据,因此 matplotlib 始终沿 range(0, ...)
绘制它并使用 x
值重新标记刻度。
要缩放条形距离,reindex
x
值以在实际数据点之间填充行:
start, stop = 0, 16
xstep = 0.01
tickstep = 2
xfill = np.round(np.arange(start, stop + xstep, xstep), 2)
out = df.set_index("x").reindex(xfill).reset_index()
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
xticklabels = np.arange(start, stop+tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
详情
将
xfill
生成为[0, 0.01, 0.02, ...]
。我试图通过从x
中提取最大小数位数来使其可移植,但浮点精度总是很棘手,因此可能需要对其进行调整:decimals = df.x.astype(str).str.split(".").str[-1].str.len().max() xstep = 10.0 ** -decimals start = 0 stop = 16 xfill = np.round(np.arange(start, stop + xstep, xstep), decimals) # array([ 0. , 0.01, 0.02, 0.03, 0.04, 0.05, ...])
reindex
x
列对应这个新的xfill
,因此填充行将为 NaN:out = df.set_index("x").reindex(xfill).reset_index() # x fid y1 y2 y3 # 0.00 NaN NaN NaN NaN # ... ... ... ... ... # 1.01 NaN NaN NaN NaN # 1.02 name 0.0 0.3 0.7 # 1.03 NaN NaN NaN NaN # ... ... ... ... ... # 1.29 NaN NaN NaN NaN # 1.30 name 1.0 0.0 0.0 # 1.31 NaN NaN NaN NaN # ... ... ... ... ... # 1.99 NaN NaN NaN NaN # 2.00 name 0.2 0.1 0.7 # 2.01 NaN NaN NaN NaN # ... ... ... ... ... # 4.99 NaN NaN NaN NaN # 5.00 name 0.6 0.1 0.3 # 5.01 NaN NaN NaN NaN # ... ... ... ... ... # 10.59 NaN NaN NaN NaN # 10.60 name 0.1 0.4 0.5 # 10.61 NaN NaN NaN NaN # ... ... ... ... ... # 16.00 NaN NaN NaN NaN
绘制重新索引的数据(
xticks
间隔tickstep
):ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3)) tickstep = 2 xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float) xticks = out.index[out.x.isin(xticklabels)] ax.set_xticks(xticks) ax.set_xticklabels(xticklabels) plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
组合代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df = pd.DataFrame({"fid": ["name", "name", "name", "name", "name"], "x": [1.02, 1.3, 2, 5, 10.6], "y1": [0, 1, 0.2, 0.6, 0.1], "y2": [0.3, 0, 0.1, 0.1, 0.4], "y3": [0.7, 0, 0.7, 0.3, 0.5]})
decimals = df.x.astype(str).str.split(".").str[-1].str.len().max()
xstep = 10.0 ** -decimals
start = 0
stop = 16
xfill = np.round(np.arange(start, stop + xstep, xstep), decimals)
out = df.set_index("x").reindex(xfill).reset_index()
ax = out.plot.bar(x="x", stacked=True, width=20, figsize=(10, 3))
tickstep = 2
xticklabels = np.arange(start, stop + tickstep, tickstep).astype(float)
xticks = out.index[out.x.isin(xticklabels)]
ax.set_xticks(xticks)
ax.set_xticklabels(xticklabels)
plt.legend(bbox_to_anchor=(1.05, 1), loc=2)
我在下面的回答说明了如何进行间距堆叠。您可以采用该解决方案并根据您的需要定制功能,例如,您不需要使用 itertools,只需使用常规计数器即可。您还可以根据需要定制参数。
解决方案背后的想法:
- 使用cumsum计算堆叠
- 使用 matplotlib 绘制条形图(而不是堆栈)并使用 zorder 控制哪个在前面。
函数
from itertools import count
from math import floor
def plt_stack_spacing( df , figsize=(10,6) , width=0.2 , bb_anchor=(1.05,1)):
ycol = df.columns[2:]
df1 = df.iloc[:,[0,1]]
df1 = df1.join(df[ycol].cumsum(axis=1))
c = count(0,-1) # either itertools.count or manuall adjust the number c+=1
plt.figure(figsize=figsize)
for col in ycol:
plt.bar(df1.x,df1[col],label=col,width=width,zorder=next(c))
xmin = floor(df.x.min())
xmax = floor(df.x.max())
xt = [*range(xmin,xmax+2)]
plt.xticks(xt)
plt.legend(bbox_to_anchor=bb_anchor, loc=2)
plt.show();
调用函数:
plt_stack_spacing(df,(18,5),0.2,(1.01,1))
输出:
基准:300 行和 4 列 (y1,y2,y3,y4) 的 100 个图的时间 = 225 秒 = 3.75 分钟 没有 增强。