如何评估UMAP中保留的信息？

How to evaluate the information retained in UMAP?

我试图为 UMAP 找到一个类似于 explained_variance_ratio（在 sklearn 中的 PCA 中）的属性，但我找不到这样的东西。在 PCA 中，我可以对 n_components 的不同值使用 explained_variance_ratio 并比较结果。在 python 中有什么东西可以用于 UMAP 吗？

您无法轻松估计 UMAP 解释的方差，因为与 PCA 相比，它是一种非线性降维形式。下面是更详细的潜水。

PCA 尝试在高维 space 中找到能够捕获尽可能多方差的投影。您将数据投影到这些正交平面上，并且您可以估计每个平面捕获的方差，与原始数据中的方差进行比较。它始终是一个线性操作，因此您可以定义解释的方差。你可以看看this post about variance explained or this about PCA

UMAP 是一种非线性降维形式。来自 help page, UMAP uses so called simplicial complexes to capture the topological space of your features, and from there obtain a low dimensional reduction. You can think of it as a high dimensionl graph that more geared towards capturing the inter-connectedness between data points than the variance. Hence, as of now, I am not aware of a way to retrieve the variance explained in a UMAP. You can also check out the author's reply on github.

如何评估UMAP中保留的信息？

How to evaluate the information retained in UMAP?

python

machine-learning

dimensionality-reduction

scikit-learn

data-science