当我尝试降低数据维度时，值错误 eps=0.100000。这可能是什么原因？

Question

我正在尝试将 scikit 的 GaussianRandomProjection 与我的数据集一起使用，其形状为 1599 x 11，如下所示：

transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(wine_data.values[:, :11])

当我这样做时，我收到一条错误消息：

ValueError: eps=0.100000 and n_samples=1599 lead to a
target dimension of 6323 which is larger than the original 
space with n_features=1

我不明白错误。这到底是什么意思？我如何使用 GaussianRandomProjection 来降低数据维度？

Answer 1

这里直接引用官方 Scikit-Learn 文档关于 GaussianRandomProjection 的参数 n_components:

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

在您的情况下，估计器似乎倾向于在 "reducing" 维度后产生 6323 维的投影目标。这显然是出乎意料的，因为您希望减少维度而不是增加维度。我建议您首先假设您想要的输出的维度（即 8），然后测试模型是否以预期的方式工作。

transformer = GaussianRandomProjection(n_components=8) #Set your desired dimension of the output
X_new = transformer.fit_transform(wine_data.values[:, :11])

祝你好运

当我尝试降低数据维度时，值错误 eps=0.100000。这可能是什么原因？

Value Error eps=0.100000 as I try to reduce data dimensionaity. What could be the reason for this?

python

machine-learning

dimensionality-reduction

scikit-learn