从 Seaborn 配对图中获取数据数组
Get data array from a Seaborn pairplot
我使用了 seaborn pairplot 函数,想提取一个数据数组。
import seaborn as sns
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")
我想获得我在下面以 black color:
显示的点的数组
谢谢。
就这一行:
data = iris[iris['species'] == 'setosa']['sepal_length']
您对蓝线感兴趣,所以'setosa'
scpecie。为了过滤 iris
数据框,我创建了这个过滤器:
iris['species'] == 'setosa'
这是一个布尔数组,如果 iris
数据框的 'species'
列中对应的行是 'setosa'
,则其值为 True
,False
除此以外。使用这行代码:
iris[iris['species'] == 'setosa']
我将过滤器应用于数据框,以便仅提取与 'setosa'
种类关联的行。最后,我提取了 'sepal_length'
列:
iris[iris['species'] == 'setosa']['sepal_length']
如果我使用以下代码为此数据数组绘制 KDE:
data = iris[iris['species'] == 'setosa']['sepal_length']
sns.kdeplot(data)
我得到:
上面就是你感兴趣的情节
KDE 的计算方式与上图不同。
我引用这个 reference:
The y-axis in a density plot is the probability density function for
the kernel density estimation. However, we need to be careful to
specify this is a probability density and not a probability. The
difference is the probability density is the probability per unit on
the x-axis. To convert to an actual probability, we need to find the
area under the curve for a specific interval on the x-axis. Somewhat
confusingly, because this is a probability density and not a
probability, the y-axis can take values greater than one. The only
requirement of the density plot is that the total area under the curve
integrates to one. I generally tend to think of the y-axis on a
density plot as a value only for relative comparisons between
different categories.
我使用了 seaborn pairplot 函数,想提取一个数据数组。
import seaborn as sns
iris = sns.load_dataset("iris")
sns.pairplot(iris, hue="species")
我想获得我在下面以 black color:
显示的点的数组谢谢。
就这一行:
data = iris[iris['species'] == 'setosa']['sepal_length']
您对蓝线感兴趣,所以'setosa'
scpecie。为了过滤 iris
数据框,我创建了这个过滤器:
iris['species'] == 'setosa'
这是一个布尔数组,如果 iris
数据框的 'species'
列中对应的行是 'setosa'
,则其值为 True
,False
除此以外。使用这行代码:
iris[iris['species'] == 'setosa']
我将过滤器应用于数据框,以便仅提取与 'setosa'
种类关联的行。最后,我提取了 'sepal_length'
列:
iris[iris['species'] == 'setosa']['sepal_length']
如果我使用以下代码为此数据数组绘制 KDE:
data = iris[iris['species'] == 'setosa']['sepal_length']
sns.kdeplot(data)
我得到:
上面就是你感兴趣的情节
KDE 的计算方式与上图不同。
我引用这个 reference:
The y-axis in a density plot is the probability density function for the kernel density estimation. However, we need to be careful to specify this is a probability density and not a probability. The difference is the probability density is the probability per unit on the x-axis. To convert to an actual probability, we need to find the area under the curve for a specific interval on the x-axis. Somewhat confusingly, because this is a probability density and not a probability, the y-axis can take values greater than one. The only requirement of the density plot is that the total area under the curve integrates to one. I generally tend to think of the y-axis on a density plot as a value only for relative comparisons between different categories.