当维度大于数据数量时,我可以使用 t-SNE 吗?
Can I use t-SNE when the dimension is larger than the number of data?
我正在将 t-SNE 与来自该网站 (https://lvdmaaten.github.io/tsne/) 的 matlab 代码一起使用。但是,每当我 运行 这个程序的数据维度大于数据数量时,就会出现错误。下面的代码是我目前使用的代码,这里总是报错
M = M(:,ind(1:initial_dims));
错误是
Index exceeds matrix dimensions.
Error in tsne (line 62)
M = M(:,ind(1:initial_dims));
我用matlab中的命令调用了这个tsne函数
output = tsne(input, [], 2, 640, 30);
输入大小为(162x640),维度为640,数据个数为162。下面的程序是上面网站的代码。
function ydata = tsne(X, labels, no_dims, initial_dims, perplexity)
%TSNE Performs symmetric t-SNE on dataset X
%
% mappedX = tsne(X, labels, no_dims, initial_dims, perplexity)
% mappedX = tsne(X, labels, initial_solution, perplexity)
%
% The function performs symmetric t-SNE on the NxD dataset X to reduce its
% dimensionality to no_dims dimensions (default = 2). The data is
% preprocessed using PCA, reducing the dimensionality to initial_dims
% dimensions (default = 30). Alternatively, an initial solution obtained
% from an other dimensionality reduction technique may be specified in
% initial_solution. The perplexity of the Gaussian kernel that is employed
% can be specified through perplexity (default = 30). The labels of the
% data are not used by t-SNE itself, however, they are used to color
% intermediate plots. Please provide an empty labels matrix [] if you
% don't want to plot results during the optimization.
% The low-dimensional data representation is returned in mappedX.
%
%
% (C) Laurens van der Maaten, 2010
% University of California, San Diego
if ~exist('labels', 'var')
labels = [];
end
if ~exist('no_dims', 'var') || isempty(no_dims)
no_dims = 2;
end
if ~exist('initial_dims', 'var') || isempty(initial_dims)
initial_dims = min(50, size(X, 2));
end
if ~exist('perplexity', 'var') || isempty(perplexity)
perplexity = 30;
end
% First check whether we already have an initial solution
if numel(no_dims) > 1
initial_solution = true;
ydata = no_dims;
no_dims = size(ydata, 2);
perplexity = initial_dims;
else
initial_solution = false;
end
% Normalize input data
X = X - min(X(:));
X = X / max(X(:));
X = bsxfun(@minus, X, mean(X, 1));
% Perform preprocessing using PCA
if ~initial_solution
disp('Preprocessing data using PCA...');
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
[M, lambda] = eig(C);
[lambda, ind] = sort(diag(lambda), 'descend');
M = M(:,ind(1:initial_dims));
lambda = lambda(1:initial_dims);
if ~(size(X, 2) < size(X, 1))
M = bsxfun(@times, X' * M, (1 ./ sqrt(size(X, 1) .* lambda))');
end
X = bsxfun(@minus, X, mean(X, 1)) * M;
clear M lambda ind
end
% Compute pairwise distance matrix
sum_X = sum(X .^ 2, 2);
D = bsxfun(@plus, sum_X, bsxfun(@plus, sum_X', -2 * (X * X')));
% Compute joint probabilities
P = d2p(D, perplexity, 1e-5); % compute affinities using fixed perplexity
clear D
% Run t-SNE
if initial_solution
ydata = tsne_p(P, labels, ydata);
else
ydata = tsne_p(P, labels, no_dims);
end
我试图理解这段代码,但我无法理解发生错误的部分。
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
为什么需要这个条件?由于'X'的大小是(162x640),所以会执行else语句。我想这就是问题所在。在 else 语句中,'C' 的大小将为 (162x162)。然而,在下一行
M = M(:,ind(1:initial_dims));
使用等于640的'initial_dims'。我是否以错误的方式使用了这段代码?或者它只是不适用于我使用的数据集?
根据文档:
使用 PCA 对数据进行预处理,将维度减少到 initial_dims 维(默认值 = 30)。因此,您应该在第一次使用时保持此参数不变。
条件if size(X, 2) < size(X, 1)
用于构造经济SVD的矩阵,使得协方差矩阵的大小会更小,从而导致更快的计算。
我正在将 t-SNE 与来自该网站 (https://lvdmaaten.github.io/tsne/) 的 matlab 代码一起使用。但是,每当我 运行 这个程序的数据维度大于数据数量时,就会出现错误。下面的代码是我目前使用的代码,这里总是报错
M = M(:,ind(1:initial_dims));
错误是
Index exceeds matrix dimensions.
Error in tsne (line 62)
M = M(:,ind(1:initial_dims));
我用matlab中的命令调用了这个tsne函数
output = tsne(input, [], 2, 640, 30);
输入大小为(162x640),维度为640,数据个数为162。下面的程序是上面网站的代码。
function ydata = tsne(X, labels, no_dims, initial_dims, perplexity)
%TSNE Performs symmetric t-SNE on dataset X
%
% mappedX = tsne(X, labels, no_dims, initial_dims, perplexity)
% mappedX = tsne(X, labels, initial_solution, perplexity)
%
% The function performs symmetric t-SNE on the NxD dataset X to reduce its
% dimensionality to no_dims dimensions (default = 2). The data is
% preprocessed using PCA, reducing the dimensionality to initial_dims
% dimensions (default = 30). Alternatively, an initial solution obtained
% from an other dimensionality reduction technique may be specified in
% initial_solution. The perplexity of the Gaussian kernel that is employed
% can be specified through perplexity (default = 30). The labels of the
% data are not used by t-SNE itself, however, they are used to color
% intermediate plots. Please provide an empty labels matrix [] if you
% don't want to plot results during the optimization.
% The low-dimensional data representation is returned in mappedX.
%
%
% (C) Laurens van der Maaten, 2010
% University of California, San Diego
if ~exist('labels', 'var')
labels = [];
end
if ~exist('no_dims', 'var') || isempty(no_dims)
no_dims = 2;
end
if ~exist('initial_dims', 'var') || isempty(initial_dims)
initial_dims = min(50, size(X, 2));
end
if ~exist('perplexity', 'var') || isempty(perplexity)
perplexity = 30;
end
% First check whether we already have an initial solution
if numel(no_dims) > 1
initial_solution = true;
ydata = no_dims;
no_dims = size(ydata, 2);
perplexity = initial_dims;
else
initial_solution = false;
end
% Normalize input data
X = X - min(X(:));
X = X / max(X(:));
X = bsxfun(@minus, X, mean(X, 1));
% Perform preprocessing using PCA
if ~initial_solution
disp('Preprocessing data using PCA...');
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
[M, lambda] = eig(C);
[lambda, ind] = sort(diag(lambda), 'descend');
M = M(:,ind(1:initial_dims));
lambda = lambda(1:initial_dims);
if ~(size(X, 2) < size(X, 1))
M = bsxfun(@times, X' * M, (1 ./ sqrt(size(X, 1) .* lambda))');
end
X = bsxfun(@minus, X, mean(X, 1)) * M;
clear M lambda ind
end
% Compute pairwise distance matrix
sum_X = sum(X .^ 2, 2);
D = bsxfun(@plus, sum_X, bsxfun(@plus, sum_X', -2 * (X * X')));
% Compute joint probabilities
P = d2p(D, perplexity, 1e-5); % compute affinities using fixed perplexity
clear D
% Run t-SNE
if initial_solution
ydata = tsne_p(P, labels, ydata);
else
ydata = tsne_p(P, labels, no_dims);
end
我试图理解这段代码,但我无法理解发生错误的部分。
if size(X, 2) < size(X, 1)
C = X' * X;
else
C = (1 / size(X, 1)) * (X * X');
end
为什么需要这个条件?由于'X'的大小是(162x640),所以会执行else语句。我想这就是问题所在。在 else 语句中,'C' 的大小将为 (162x162)。然而,在下一行
M = M(:,ind(1:initial_dims));
使用等于640的'initial_dims'。我是否以错误的方式使用了这段代码?或者它只是不适用于我使用的数据集?
根据文档: 使用 PCA 对数据进行预处理,将维度减少到 initial_dims 维(默认值 = 30)。因此,您应该在第一次使用时保持此参数不变。
条件if size(X, 2) < size(X, 1)
用于构造经济SVD的矩阵,使得协方差矩阵的大小会更小,从而导致更快的计算。