Plotly Express 条形图中具有不同组大小的组条
Group bars with different group sizes in Plotly Express bar plot
考虑以下数据框,称为 data
:
“老师”栏只有两个元素出现了两次,其他的只出现了一次。
我用 Plotly Express 制作了条形图:
import plotly.express as px
px.bar(data.sort_values("start_time", ascending=False), x="teacher", y="start_time", color="start_time",
color_continuous_scale="Bluered", barmode="group")
输出如下:
我希望条形图彼此相邻,而不是堆叠在一起。我认为 px
将它们堆叠起来(与他们的文档中的行为相反),因为我对每个老师的出现次数不同。
- 对吗?
- 我该如何解决?
根据 this forum post,发生的事情是 plotly.express
将 start_time
解释为一个连续变量,这就是为什么你得到一个颜色条,但随后又回到堆叠条的原因而不是将它们分组。
正如@Emmanuelle 所建议的,您可以通过创建一个新的 start_time 列来解决此问题,该列是一个名为 start_time_str
的字符串,然后将此列传递给 color
参数。这会强制 plotly.express 将此变量解释为离散变量。但是,您将丢失颜色条并得到一个图例:
data['start_time_str'] = data['start_time'].astype('str')
fig = px.bar(data.sort_values("start_time", ascending=False), x="teacher", y="start_time", color="start_time_str",color_continuous_scale="Bluered", barmode="group")
所以假设您想要保留颜色条,并且有堆叠条,您将需要更复杂的解决方法。
您可以使用 plotly.express 绘制第一个条形图以获得颜色条,然后使用 fig.add_trace
添加第二个条形图作为 graph_object
。添加第二个条形时,您需要指定颜色,为此,您需要一些辅助函数,例如 normalize_color_val
将此条形的 y 值转换为相对于0 到 1 范围内的数据,以及 get_color
当您传递色标名称和规范化值时,returns 条形的颜色(作为 rgb 字符串)。
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.DataFrame(
{'teacher':['Lingrand','Milanesio','Menin','Malot','Malot','Schminke','Cornelli','Milanesio','Marchello','Menin','Huet'],
'start_time':[12,12,5,0,5,0,4,8,-1,0,4]}
)
# This function allows you to retrieve colors from a continuous color scale
# by providing the name of the color scale, and the normalized location between 0 and 1
# Reference:
def get_color(colorscale_name, loc):
from _plotly_utils.basevalidators import ColorscaleValidator
# first parameter: Name of the property being validated
# second parameter: a string, doesn't really matter in our use case
cv = ColorscaleValidator("colorscale", "")
# colorscale will be a list of lists: [[loc1, "rgb1"], [loc2, "rgb2"], ...]
colorscale = cv.validate_coerce(colorscale_name)
if hasattr(loc, "__iter__"):
return [get_continuous_color(colorscale, x) for x in loc]
return get_continuous_color(colorscale, loc)
# Identical to Adam's answer
import plotly.colors
from PIL import ImageColor
def get_continuous_color(colorscale, intermed):
"""
Plotly continuous colorscales assign colors to the range [0, 1]. This function computes the intermediate
color for any value in that range.
Plotly doesn't make the colorscales directly accessible in a common format.
Some are ready to use:
colorscale = plotly.colors.PLOTLY_SCALES["Greens"]
Others are just swatches that need to be constructed into a colorscale:
viridis_colors, scale = plotly.colors.convert_colors_to_same_type(plotly.colors.sequential.Viridis)
colorscale = plotly.colors.make_colorscale(viridis_colors, scale=scale)
:param colorscale: A plotly continuous colorscale defined with RGB string colors.
:param intermed: value in the range [0, 1]
:return: color in rgb string format
:rtype: str
"""
if len(colorscale) < 1:
raise ValueError("colorscale must have at least one color")
hex_to_rgb = lambda c: "rgb" + str(ImageColor.getcolor(c, "RGB"))
if intermed <= 0 or len(colorscale) == 1:
c = colorscale[0][1]
return c if c[0] != "#" else hex_to_rgb(c)
if intermed >= 1:
c = colorscale[-1][1]
return c if c[0] != "#" else hex_to_rgb(c)
for cutoff, color in colorscale:
if intermed > cutoff:
low_cutoff, low_color = cutoff, color
else:
high_cutoff, high_color = cutoff, color
break
if (low_color[0] == "#") or (high_color[0] == "#"):
# some color scale names (such as cividis) returns:
# [[loc1, "hex1"], [loc2, "hex2"], ...]
low_color = hex_to_rgb(low_color)
high_color = hex_to_rgb(high_color)
return plotly.colors.find_intermediate_color(
lowcolor=low_color,
highcolor=high_color,
intermed=((intermed - low_cutoff) / (high_cutoff - low_cutoff)),
colortype="rgb",
)
def normalize_color_val(color_val, data=data):
return (color_val - min(data.start_time)) / (max(data.start_time - min(data.start_time)))
## add the first bars
fig = px.bar(
data.sort_values("start_time", ascending=False).loc[~data['teacher'].duplicated()],
x="teacher", y="start_time", color="start_time",
color_continuous_scale="Bluered", barmode="group"
)
## add the other bars, these will automatically be grouped
for x,y in data.sort_values("start_time", ascending=False).loc[data['teacher'].duplicated()].itertuples(index=False):
fig.add_trace(go.Bar(
x=[x],
y=[y],
marker=dict(color=get_color('Bluered', normalize_color_val(y))),
hovertemplate="teacher=%{x}<br>start_time=%{y}<extra></extra>",
showlegend=False
))
fig.show()
考虑以下数据框,称为 data
:
“老师”栏只有两个元素出现了两次,其他的只出现了一次。
我用 Plotly Express 制作了条形图:
import plotly.express as px
px.bar(data.sort_values("start_time", ascending=False), x="teacher", y="start_time", color="start_time",
color_continuous_scale="Bluered", barmode="group")
输出如下:
我希望条形图彼此相邻,而不是堆叠在一起。我认为 px
将它们堆叠起来(与他们的文档中的行为相反),因为我对每个老师的出现次数不同。
- 对吗?
- 我该如何解决?
根据 this forum post,发生的事情是 plotly.express
将 start_time
解释为一个连续变量,这就是为什么你得到一个颜色条,但随后又回到堆叠条的原因而不是将它们分组。
正如@Emmanuelle 所建议的,您可以通过创建一个新的 start_time 列来解决此问题,该列是一个名为 start_time_str
的字符串,然后将此列传递给 color
参数。这会强制 plotly.express 将此变量解释为离散变量。但是,您将丢失颜色条并得到一个图例:
data['start_time_str'] = data['start_time'].astype('str')
fig = px.bar(data.sort_values("start_time", ascending=False), x="teacher", y="start_time", color="start_time_str",color_continuous_scale="Bluered", barmode="group")
所以假设您想要保留颜色条,并且有堆叠条,您将需要更复杂的解决方法。
您可以使用 plotly.express 绘制第一个条形图以获得颜色条,然后使用 fig.add_trace
添加第二个条形图作为 graph_object
。添加第二个条形时,您需要指定颜色,为此,您需要一些辅助函数,例如 normalize_color_val
将此条形的 y 值转换为相对于0 到 1 范围内的数据,以及 get_color
当您传递色标名称和规范化值时,returns 条形的颜色(作为 rgb 字符串)。
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go
data = pd.DataFrame(
{'teacher':['Lingrand','Milanesio','Menin','Malot','Malot','Schminke','Cornelli','Milanesio','Marchello','Menin','Huet'],
'start_time':[12,12,5,0,5,0,4,8,-1,0,4]}
)
# This function allows you to retrieve colors from a continuous color scale
# by providing the name of the color scale, and the normalized location between 0 and 1
# Reference:
def get_color(colorscale_name, loc):
from _plotly_utils.basevalidators import ColorscaleValidator
# first parameter: Name of the property being validated
# second parameter: a string, doesn't really matter in our use case
cv = ColorscaleValidator("colorscale", "")
# colorscale will be a list of lists: [[loc1, "rgb1"], [loc2, "rgb2"], ...]
colorscale = cv.validate_coerce(colorscale_name)
if hasattr(loc, "__iter__"):
return [get_continuous_color(colorscale, x) for x in loc]
return get_continuous_color(colorscale, loc)
# Identical to Adam's answer
import plotly.colors
from PIL import ImageColor
def get_continuous_color(colorscale, intermed):
"""
Plotly continuous colorscales assign colors to the range [0, 1]. This function computes the intermediate
color for any value in that range.
Plotly doesn't make the colorscales directly accessible in a common format.
Some are ready to use:
colorscale = plotly.colors.PLOTLY_SCALES["Greens"]
Others are just swatches that need to be constructed into a colorscale:
viridis_colors, scale = plotly.colors.convert_colors_to_same_type(plotly.colors.sequential.Viridis)
colorscale = plotly.colors.make_colorscale(viridis_colors, scale=scale)
:param colorscale: A plotly continuous colorscale defined with RGB string colors.
:param intermed: value in the range [0, 1]
:return: color in rgb string format
:rtype: str
"""
if len(colorscale) < 1:
raise ValueError("colorscale must have at least one color")
hex_to_rgb = lambda c: "rgb" + str(ImageColor.getcolor(c, "RGB"))
if intermed <= 0 or len(colorscale) == 1:
c = colorscale[0][1]
return c if c[0] != "#" else hex_to_rgb(c)
if intermed >= 1:
c = colorscale[-1][1]
return c if c[0] != "#" else hex_to_rgb(c)
for cutoff, color in colorscale:
if intermed > cutoff:
low_cutoff, low_color = cutoff, color
else:
high_cutoff, high_color = cutoff, color
break
if (low_color[0] == "#") or (high_color[0] == "#"):
# some color scale names (such as cividis) returns:
# [[loc1, "hex1"], [loc2, "hex2"], ...]
low_color = hex_to_rgb(low_color)
high_color = hex_to_rgb(high_color)
return plotly.colors.find_intermediate_color(
lowcolor=low_color,
highcolor=high_color,
intermed=((intermed - low_cutoff) / (high_cutoff - low_cutoff)),
colortype="rgb",
)
def normalize_color_val(color_val, data=data):
return (color_val - min(data.start_time)) / (max(data.start_time - min(data.start_time)))
## add the first bars
fig = px.bar(
data.sort_values("start_time", ascending=False).loc[~data['teacher'].duplicated()],
x="teacher", y="start_time", color="start_time",
color_continuous_scale="Bluered", barmode="group"
)
## add the other bars, these will automatically be grouped
for x,y in data.sort_values("start_time", ascending=False).loc[data['teacher'].duplicated()].itertuples(index=False):
fig.add_trace(go.Bar(
x=[x],
y=[y],
marker=dict(color=get_color('Bluered', normalize_color_val(y))),
hovertemplate="teacher=%{x}<br>start_time=%{y}<extra></extra>",
showlegend=False
))
fig.show()