从 Seaborn 配对图中获取数据数组

Get data array from a Seaborn pairplot

我使用了 seaborn pairplot 函数,想提取一个数据数组。

import seaborn as sns

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")

我想获得我在下面以 black color:

显示的点的数组

谢谢。

就这一行:

data = iris[iris['species'] == 'setosa']['sepal_length']

您对蓝线感兴趣,所以'setosa' scpecie。为了过滤 iris 数据框,我创建了这个过滤器:

iris['species'] == 'setosa'

这是一个布尔数组,如果 iris 数据框的 'species' 列中对应的行是 'setosa',则其值为 TrueFalse 除此以外。使用这行代码:

iris[iris['species'] == 'setosa']

我将过滤器应用于数据框,以便仅提取与 'setosa' 种类关联的行。最后,我提取了 'sepal_length' 列:

iris[iris['species'] == 'setosa']['sepal_length']

如果我使用以下代码为此数据数组绘制 KDE:

data = iris[iris['species'] == 'setosa']['sepal_length']
sns.kdeplot(data)

我得到:

上面就是你感兴趣的情节

KDE 的计算方式与上图不同。
我引用这个 reference:

The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.