当我尝试降低数据维度时,值错误 eps=0.100000。这可能是什么原因?

Value Error eps=0.100000 as I try to reduce data dimensionaity. What could be the reason for this?

我正在尝试将 scikit 的 GaussianRandomProjection 与我的数据集一起使用,其形状为 1599 x 11,如下所示:

transformer = random_projection.GaussianRandomProjection()
X_new = transformer.fit_transform(wine_data.values[:, :11])

当我这样做时,我收到一条错误消息:

ValueError: eps=0.100000 and n_samples=1599 lead to a
target dimension of 6323 which is larger than the original 
space with n_features=1

我不明白错误。这到底是什么意思?我如何使用 GaussianRandomProjection 来降低数据维度?

这里直接引用官方 Scikit-Learn 文档关于 GaussianRandomProjection 的参数 n_components:

Dimensionality of the target projection space.

n_components can be automatically adjusted according to the number of samples in the dataset and the bound given by the Johnson-Lindenstrauss lemma. In that case the quality of the embedding is controlled by the eps parameter.

It should be noted that Johnson-Lindenstrauss lemma can yield very conservative estimated of the required number of components as it makes no assumption on the structure of the dataset.

在您的情况下,估计器似乎倾向于在 "reducing" 维度后产生 6323 维的投影目标。这显然是出乎意料的,因为您希望减少维度而不是增加维度。我建议您首先假设您想要的输出的维度(即 8),然后测试模型是否以预期的方式工作。

transformer = GaussianRandomProjection(n_components=8) #Set your desired dimension of the output
X_new = transformer.fit_transform(wine_data.values[:, :11])

祝你好运