获取积分创建 KDE plot

Question

我正在尝试从 KDE 绘图中获取点，以便通过 API 发送它们，以便绘图可以通过前端显示。例如，如果我有以下数据：

df = pd.DataFrame({'x': [3000.0,
  2897.0,
  4100.0,
  2539.28,
  5000.0,
  3615.0,
  2562.05,
  2535.0,
  2413.0,
  2246.0],
 'y': [1, 2, 1, 1, 1, 2, 1, 3, 1, 1]})

import seaborn as sns

sns.kdeplot(x=df['x'], weights=df['y'])

我使用 seaborn kdeplot 绘制它，它给了我这个图：

现在我想通过 API 发送这个情节的一些要点。我的想法是使用 sklearn 中的 KernelDensity 来估计某些点的密度。所以我使用了这段代码：

from sklearn.neighbors import KernelDensity
x_points = np.linspace(0, df['x'].max(), 30)
kde = KernelDensity()
kde.fit(df['x'].values.reshape(-1, 1), sample_weight=df['y'])
 
logprob = kde.score_samples(x_points.reshape(-1, 1))
 
new_df = pd.DataFrame({'x': x_points, 'y': np.exp(logprob)})

如果我使用线图绘图，它看起来一点也不像 seaborn kdeplot。

我的问题是：给定一个数据框和显示的 kdeplot，我怎样才能得到这个图中某个点 x 的概率？

编辑：向绘图添加代码 sns.kdeplot

Answer 1

为什么 sklearn 的情节看起来不一样？因为带宽默认设置为1。从 x 数据的规模来看，它应该高得多。您只需更改一行即可解决此问题：

kde = KernelDensity(bandwidth=500)

现在，Seaborn 实际上会自动设置带宽，Scipy 允许您像 explained here 那样做。

Seaborn 是 matplotlib 和 returns matplotlib 轴之上的一层，因此您可以使用与 this question 关于从 matplotlib 绘图获取数据的相同答案。

import matplotlib.pyplot as plt
plt.gca().get_lines()[0].get_xydata()

输出如您所愿：

array([[5.70706380e+02, 7.39051159e-07],
       [6.01382697e+02, 9.00695337e-07],
       [6.32059015e+02, 1.09427429e-06],
       [6.62735333e+02, 1.32531892e-06],
       [6.93411651e+02, 1.60015322e-06],
       [7.24087969e+02, 1.92597554e-06],
       [7.54764286e+02, 2.31094202e-06],
       [7.85440604e+02, 2.76425104e-06],
       [8.16116922e+02, 3.29622720e-06],
       ...])

获取积分创建 KDE plot

Get points to create KDE plot

python

kernel-density

pandas

seaborn