更改 seaborn 直方图（或 plt）中数据选择的条形颜色

Question

假设我有一个像这样的数据框：

X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})

我正在执行以下绘图程序：

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    sns.distplot(a[c], ax = axes[1,i])
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')

plt.tight_layout()
plt.show()

产生：

现在我希望能够对数据框 a 执行数据选择。让我们这样说：

b = a[(a['X2'] <4)]

并在发布的直方图中突出显示 b 中的选择。例如，如果 b 的第一行对于 X3 是 [32:0]，对于 X2 是 [0:5]，则所需的输出将是：

是否可以使用上面的for循环和sns来做到这一点？非常感谢！

编辑：我对 matplotlib 解决方案也很满意，如果更简单的话。

编辑 2：

如果有帮助，将类似于执行以下操作：

b = a[(a['X3'] >38)]

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))

for i, c in enumerate(a.columns):
   sns.boxplot(a[c], ax=axes[0,i])
   sns.distplot(a[c], ax = axes[1,i])
   sns.distplot(b[c], ax = axes[1,i])

   axes[1, i].set(yticklabels=[])
   axes[1, i].set(xlabel='')
   axes[1, i].set(ylabel='')

plt.tight_layout()
plt.show()

产生以下结果：

但是，我希望能够用不同的颜色为第一个图中的那些条着色！ 我还考虑过将 ylim 设置为仅蓝色图的大小，这样橙色就不会扭曲蓝色分布的形状，但它仍然不可行，因为实际上我有大约 10 个直方图要显示，并且设置 ylim 与 sharey=True 几乎相同，我试图避免这种情况，以便我能够显示分布的真实形状。

Answer 1

我创建了以下代码，理解你的问题的目的是根据在特定条件下提取的数据向直方图添加不同的颜色。使用 np.histogram() 获取频率数组和 bin 数组。获取某个条件下提取的第一行数据的值最接近的值的索引。使用检索到的索引更改直方图的颜色。可以用同样的方法处理另一个图。

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))
for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    sns.distplot(a[c], ax = axes[1,i])
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')

b = a[(a['X2'] <4)]
hist3, bins3 = np.histogram(X3)
idx = np.abs(np.asarray(hist3) - b['X3'].head(1).values[0]).argmin()

for k in range(idx):
    axes[1,0].get_children()[k].set_color("red")

plt.tight_layout()
plt.show()

Answer 2

我想我是从上一个答案和 this 视频中获得灵感找到了解决方案：

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

np.random.seed(2021)
X2 = np.random.normal(10, 3, 200)
X3 = np.random.normal(34, 2, 200)

a = pd.DataFrame({"X3": X3, "X2":X2})
b = a[(a['X3'] < 30)]


hist_idx=[]

for i, c in enumerate(a.columns):
    bin_ = np.histogram(a[c], bins=20)[1]
    hist = np.where(np.logical_and(bin_<=max(b[c]), bin_>min(b[c])))
    hist_idx.append(hist)
    

f, axes = plt.subplots(2, 2,  gridspec_kw={"height_ratios":(.10, .30)}, figsize = (13, 4))

for i, c in enumerate(a.columns):
    sns.boxplot(a[c], ax=axes[0,i])
    axes[1, i].hist(a[c], bins = 20)
    axes[1, i].set(yticklabels=[])
    axes[1, i].set(xlabel='')
    axes[1, i].set(ylabel='')
    
for it, index in enumerate(hist_idx):
    lenght = len(index[0])
    for r in range(lenght):
        try:
            axes[1, it].patches[index[0][r]-1].set_fc("red")
        except:
            pass 


plt.tight_layout()
plt.show()

为 b = a[(a['X3'] < 30)] 生成以下内容：

或 b = a[(a['X3'] > 36)]：

以为我会把它留在这里 - 虽然小众，但将来可能会对某人有所帮助！

更改 seaborn 直方图（或 plt）中数据选择的条形颜色

change color of bar for data selection in seaborn histogram (or plt)

distribution

matplotlib

pandas

seaborn