python plotly graph_objects 框标记的 属性 outliercolor 不起作用(可能是错误)
The property outliercolor of python plotly graph_objects box marker is not working (possible bug)
我想我在 class plotly.graph_objects.box 标记中发现了一个错误,因为 属性 outliercolor 不起作用。我遵循了 https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Box.html#plotly.graph_objects.box.Marker.outliercolor 中的参考,但更改异常值颜色不会有任何区别。
这是一个例子:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from matplotlib.colors import LinearSegmentedColormap, to_hex
df_plot = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
cat_var = "species"
num_var = "petal_length"
lvls = df_plot[cat_var].unique()
n_levels = len(lvls)
cmap = LinearSegmentedColormap.from_list("my_palette", ["#111539", "#97A1D9"])
my_palette = [to_hex(j) for j in [cmap(i/n_levels) for i in np.array(range(n_levels))]]
boxes = []
for l in range(n_levels):
boxes += [
go.Box(
name = lvls[l],
y = df_plot.loc[df_plot.loc[:, cat_var] == lvls[l], num_var].values,
width = 0.4,
boxpoints = "outliers",
marker = {
"outliercolor": "red", ### there may be a plotly.go bug here
"color": my_palette[l],
"size": 30,
"opacity": 0.5
}
)
]
fig = go.Figure(data = boxes)
fig.update_layout(
font = dict(
size = 18
),
showlegend = False,
plot_bgcolor = "white",
hoverlabel = dict(
font_size = 18,
font_family = "Rockwell"
)
)
fig.show()
这确实是 Plotly 中的错误 - 这可以作为错误报告提交给 Plotly 团队。
值得注意的是,将 boxpoints = "outliers"
修改为 boxpoints = "suspectedoutliers"
会产生不同颜色的标记,因此 suspectedoutliers
的行为符合预期。但是,您不能使用 suspectedoutliers
代替 outliers
,因为可疑异常值只是所有异常值的一个子集。
您可以通过手动绘制异常值来实现所需的行为。为此,您仍然需要设置 boxpoints=outliers
,但随后将异常值绘制为单独的散点,并在 Plotly 生成的异常值上使用所需的颜色。
这有点密集,因为这需要重写算法以在 Plotly 库执行此计算时准确地确定异常值。不幸的是,您无法以任何方式从 go.Box 或 Plotly as these computations are performed by the Javascript under the hood when the figure renders.
中提取 Q1、Q3 或其他统计数据
首先要注意的是,不同 Python 库的 Q1 和 Q3 计算不同:Plotly 在 documentation, explaining that they use Method #10 in this short paper 中概述了它们计算百分位数的方法。
在 Python 中,使用方法 #10(线性插值)计算百分位数的函数如下所示:
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(x), ceil(x)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return y1 + ((x - x1) / (x2 - x1))*(y2 - y1)
现在要从数据集中提取离群值,您需要对数据进行子集化:任何低于 (Q1 - 1.5 * IQR) 或高于 (Q3 + 1.5 * IQR) 的值,其中 IQR = Q3 - Q1 被视为离群值。
综合起来:
from math import floor, ceil
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from matplotlib.colors import LinearSegmentedColormap, to_hex
df_plot = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
cat_var = "species"
num_var = "petal_length"
lvls = df_plot[cat_var].unique()
n_levels = len(lvls)
cmap = LinearSegmentedColormap.from_list("my_palette", ["#111539", "#97A1D9"])
my_palette = [to_hex(j) for j in [cmap(i/n_levels) for i in np.array(range(n_levels))]]
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(n*p), ceil(n*p)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return y1 + ((x - x1) / (x2 - x1))*(y2 - y1)
def get_fences(data):
q1, q3 = get_percentile(data, 0.25), get_percentile(data, 0.75)
iqr = q3-q1
return (q1 - (1.5*iqr), q3 + (1.5*iqr))
boxes = []
for l in range(n_levels):
data = df_plot.loc[df_plot.loc[:, cat_var] == lvls[l], num_var].values
outliers = data[(data < get_fences(data)[0]) | (data > get_fences(data)[1])]
print(outliers)
boxes += [
go.Box(
name = lvls[l],
y = data,
width = 0.4,
boxpoints = "outliers",
marker = {
"outliercolor": "red", ### there may be a plotly.go bug here
"color": my_palette[l],
"size": 30,
"opacity": 0.5
}
),
go.Scatter(
x = [lvls[l]]*len("outliers"),
y = outliers,
mode = 'markers',
marker=dict(color="red", size=28, opacity=0.5)
)
]
fig = go.Figure(data = boxes)
fig.update_layout(
font = dict(
size = 18
),
showlegend = False,
plot_bgcolor = "white",
hoverlabel = dict(
font_size = 18,
font_family = "Rockwell"
)
)
fig.show()
作为检查我们工作的一种方式,您会注意到手动添加的略小的离群值与 Plotly 确定的离群值相匹配。 (您可以使手动添加的离群值更大,以掩盖 Plotly 生成的不是所需颜色的离群值)
我想我在 class plotly.graph_objects.box 标记中发现了一个错误,因为 属性 outliercolor 不起作用。我遵循了 https://plotly.github.io/plotly.py-docs/generated/plotly.graph_objects.Box.html#plotly.graph_objects.box.Marker.outliercolor 中的参考,但更改异常值颜色不会有任何区别。
这是一个例子:
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from matplotlib.colors import LinearSegmentedColormap, to_hex
df_plot = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
cat_var = "species"
num_var = "petal_length"
lvls = df_plot[cat_var].unique()
n_levels = len(lvls)
cmap = LinearSegmentedColormap.from_list("my_palette", ["#111539", "#97A1D9"])
my_palette = [to_hex(j) for j in [cmap(i/n_levels) for i in np.array(range(n_levels))]]
boxes = []
for l in range(n_levels):
boxes += [
go.Box(
name = lvls[l],
y = df_plot.loc[df_plot.loc[:, cat_var] == lvls[l], num_var].values,
width = 0.4,
boxpoints = "outliers",
marker = {
"outliercolor": "red", ### there may be a plotly.go bug here
"color": my_palette[l],
"size": 30,
"opacity": 0.5
}
)
]
fig = go.Figure(data = boxes)
fig.update_layout(
font = dict(
size = 18
),
showlegend = False,
plot_bgcolor = "white",
hoverlabel = dict(
font_size = 18,
font_family = "Rockwell"
)
)
fig.show()
这确实是 Plotly 中的错误 - 这可以作为错误报告提交给 Plotly 团队。
值得注意的是,将 boxpoints = "outliers"
修改为 boxpoints = "suspectedoutliers"
会产生不同颜色的标记,因此 suspectedoutliers
的行为符合预期。但是,您不能使用 suspectedoutliers
代替 outliers
,因为可疑异常值只是所有异常值的一个子集。
您可以通过手动绘制异常值来实现所需的行为。为此,您仍然需要设置 boxpoints=outliers
,但随后将异常值绘制为单独的散点,并在 Plotly 生成的异常值上使用所需的颜色。
这有点密集,因为这需要重写算法以在 Plotly 库执行此计算时准确地确定异常值。不幸的是,您无法以任何方式从 go.Box 或 Plotly as these computations are performed by the Javascript under the hood when the figure renders.
中提取 Q1、Q3 或其他统计数据首先要注意的是,不同 Python 库的 Q1 和 Q3 计算不同:Plotly 在 documentation, explaining that they use Method #10 in this short paper 中概述了它们计算百分位数的方法。
在 Python 中,使用方法 #10(线性插值)计算百分位数的函数如下所示:
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(x), ceil(x)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return y1 + ((x - x1) / (x2 - x1))*(y2 - y1)
现在要从数据集中提取离群值,您需要对数据进行子集化:任何低于 (Q1 - 1.5 * IQR) 或高于 (Q3 + 1.5 * IQR) 的值,其中 IQR = Q3 - Q1 被视为离群值。
综合起来:
from math import floor, ceil
import numpy as np
import pandas as pd
import plotly.graph_objects as go
from matplotlib.colors import LinearSegmentedColormap, to_hex
df_plot = pd.read_csv('https://raw.githubusercontent.com/mwaskom/seaborn-data/master/iris.csv')
cat_var = "species"
num_var = "petal_length"
lvls = df_plot[cat_var].unique()
n_levels = len(lvls)
cmap = LinearSegmentedColormap.from_list("my_palette", ["#111539", "#97A1D9"])
my_palette = [to_hex(j) for j in [cmap(i/n_levels) for i in np.array(range(n_levels))]]
## calculate quartiles as outlined in the plotly documentation
def get_percentile(data, p):
data.sort()
n = len(data)
x = n*p + 0.5
x1, x2 = floor(n*p), ceil(n*p)
y1, y2 = data[x1-1], data[x2-1] # account for zero-indexing
return y1 + ((x - x1) / (x2 - x1))*(y2 - y1)
def get_fences(data):
q1, q3 = get_percentile(data, 0.25), get_percentile(data, 0.75)
iqr = q3-q1
return (q1 - (1.5*iqr), q3 + (1.5*iqr))
boxes = []
for l in range(n_levels):
data = df_plot.loc[df_plot.loc[:, cat_var] == lvls[l], num_var].values
outliers = data[(data < get_fences(data)[0]) | (data > get_fences(data)[1])]
print(outliers)
boxes += [
go.Box(
name = lvls[l],
y = data,
width = 0.4,
boxpoints = "outliers",
marker = {
"outliercolor": "red", ### there may be a plotly.go bug here
"color": my_palette[l],
"size": 30,
"opacity": 0.5
}
),
go.Scatter(
x = [lvls[l]]*len("outliers"),
y = outliers,
mode = 'markers',
marker=dict(color="red", size=28, opacity=0.5)
)
]
fig = go.Figure(data = boxes)
fig.update_layout(
font = dict(
size = 18
),
showlegend = False,
plot_bgcolor = "white",
hoverlabel = dict(
font_size = 18,
font_family = "Rockwell"
)
)
fig.show()
作为检查我们工作的一种方式,您会注意到手动添加的略小的离群值与 Plotly 确定的离群值相匹配。 (您可以使手动添加的离群值更大,以掩盖 Plotly 生成的不是所需颜色的离群值)