从 Seaborn 配对图中获取数据数组

Question

我使用了 seaborn pairplot 函数，想提取一个数据数组。

import seaborn as sns

iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")

我想获得我在下面以 black color:

显示的点的数组

谢谢。

Answer 1

就这一行：

data = iris[iris['species'] == 'setosa']['sepal_length']

您对蓝线感兴趣，所以'setosa' scpecie。为了过滤 iris 数据框，我创建了这个过滤器：

iris['species'] == 'setosa'

这是一个布尔数组，如果 iris 数据框的 'species' 列中对应的行是 'setosa'，则其值为 True，False 除此以外。使用这行代码：

iris[iris['species'] == 'setosa']

我将过滤器应用于数据框，以便仅提取与 'setosa' 种类关联的行。最后，我提取了 'sepal_length' 列：

iris[iris['species'] == 'setosa']['sepal_length']

如果我使用以下代码为此数据数组绘制 KDE：

data = iris[iris['species'] == 'setosa']['sepal_length']
sns.kdeplot(data)

我得到：

上面就是你感兴趣的情节

KDE 的计算方式与上图不同。
我引用这个 reference:

The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.

从 Seaborn 配对图中获取数据数组

Get data array from a Seaborn pairplot

python

python-3.x

kernel-density

seaborn