Seaborn 散点图颜色不正确 'hue'

Seaborn scatterplot does not color correctly 'hue'

我在为散点图标记着色时遇到了一些问题。我有一个简单的数据框,其值为“pos”和另外两个值“af_min”和“af_max”。我想根据 af_x 和 af_y 的某些条件为标记着色,但由于我没有任何列可用作色调,因此我创建了自己的列“颜色”。

       pos      af_x      af_y  color 
0  3671023  0.200000  0.333333    2.0
1  4492071  0.176471  0.333333    2.0
2  4492302  0.222222  0.285714    2.0
3  4525905  0.298246  0.234043    2.0
4  4520905  0.003334  0.234043    1.0
5  4520905  0.400098  0.000221    0.0
6  4520905  0.001134  0.714043    1.0
7  4520905  0.559008  0.010221    0.0

现在,我使用 seaborn 和 seaborn 调色板以这种方式创建散点图:

sns.scatterplot(data = df, x="af_x", y="af_y", hue="color", palette = "hsv", s=40, legend=False)

但结果如下:如您所见,一种色调不会着色,因为只有两种颜色,蓝色和红色。 .

现在发生了一些非常奇怪的事情:为了绕过这个问题,我构建了自己的调色板广告并将其添加到 seaborn istance 中。但是散点图不是用我选择的阴影着色,而是用我之前在另一个脚本中使用的一些颜色着色,而且没有办法改变它们。这里的情节: 这是代码:

           #violet      #green      #orange
 colors = ['#747FE3', '#8EE35D', '#E37346']
 sns.set_palette(sns.color_palette(colors))

 sns.scatterplot(data = df,  x="af_x", y="af_y", hue="color", s=40, legend=False)

我把整个脚本放在这里,以便您可以复制它:

import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

lst = [[3671023, 0.200000, 0.333333], [4492071, 0.176471, 0.333333],
      [4492302, 0.222222, 0.285714], [4525905, 0.298246, 0.234043],
      [4520905, 0.003334, 0.234043], [4520905, 0.400098, 0.000221], 
      [4520905, 0.001134, 0.714043], [4520905, 0.559008, 0.010221]
      ]
df = pd.DataFrame(lst, columns =['pos', 'af_x', 'af_y'])

afMin=0.1
afMax=0.9

df['color']=np.nan
for index in df.index:
  afx=df.loc[index, "af_x"]
  afy=df.loc[index, "af_y"]
  if ((afx >= afMin and afx <= afMax) and (afy < afMin or afy > afMax)):
      df.loc[index, "color"] = 0
  elif ((afy >= afMin and afy <= afMax) and (afx < afMin or afx > afMax)):
      df.loc[index, "color"] = 1
  elif ((afy >= afMin and afy <= afMax) and (afx >= afMin or afx <= afMax)):
      df.loc[index, "color"] = 2

sns.scatterplot(data = df,  x="af_x", y="af_y", hue="color", palette = "hsv", s=40, 
legend=False)

plt.savefig("stack_why_hsv.png")

           #violet      #green      #orange
colors = ['#747FE3', '#8EE35D', '#E37346']
sns.set_palette(sns.color_palette(colors))

sns.scatterplot(data = df,  x="af_x", y="af_y", hue="color", s=40, legend=False)
plt.savefig("stack_why_personal.png")

感谢任何能提供帮助的人!

第一个示例的问题在于 hsv 调色板在其开始和结束时具有相同的颜色。这是因为 "hsv" is a circular variable, going from 0 to 360 degrees. Matplotlib default uses 3 colors, uniformly spaced over the range of colors, so using the red from the start, the cyan from the center and again the red from the end. So, hsv isn't the most adequate color scheme in this case. See matplotlib's available colormaps and seaborn's extensions.

中的“h”

hsv 调色板:

对于你的第二个例子,sns.set_palette() sets matplotlib's color cycle, but seaborn itself doesn't always use it. When a numeric hue is given, seaborn default chooses the rocket colormap by default. From the documentation:

The default treatment of the hue (and to a lesser extent, size) semantic, if present, depends on whether the variable is inferred to represent “numeric” or “categorical” data. In particular, numeric variables are represented with a sequential colormap by default, and the legend entries show regular “ticks” with values that may or may not exist in the data.

使用自定义调色板的最简单方法是直接将其提供给函数(无需调用 sns.color_palette(),因为 seaborn 调色板内部只是颜色列表):

colors = ['#747FE3', '#8EE35D', '#E37346']
sns.scatterplot(data = df,  x="af_x", y="af_y", hue="color", palette=colors, s=40)

PS:set_palettescatterplot 使用,当色调是绝对的。这是一个例子。我还添加了 preferred way to set values to a selection of rows;这对于大型数据框很重要。请注意,数组的布尔运算在这里需要相当多的括号。

afMin = 0.1
afMax = 0.9

df['color'] = ""
afx = df["af_x"]
afy = df["af_y"]
df.loc[((afx >= afMin) & (afx <= afMax) & ((afy < afMin) | (afy > afMax))), "color"] = "a"
df.loc[((afy >= afMin) & (afy <= afMax) & ((afx < afMin) | (afx > afMax))), "color"] = "b"
df.loc[((afy >= afMin) & (afy <= afMax) & (afx >= afMin) & (afx <= afMax)), "color"] = "c"

colors = ['#747FE3', '#8EE35D', '#E37346']
sns.set_palette(sns.color_palette(colors))

sns.scatterplot(data=df, x="af_x", y="af_y", hue="color", s=40)