成对相似性和样本排序

Question

以下是我正在尝试解决的作业问题：

相似度矩阵的可视化。用四维向量（萼片长度、萼片宽度、花瓣长度、花瓣宽度）表示每个样本。对于每两个样本，计算它们的成对相似度。您可以使用欧几里得距离或其他度量来这样做。这导致相似度矩阵，其中元素 (i,j) 存储样本 i 和 j 之间的相似度。请对所有样本进行排序，使同一类别的样本出现在一起。使用函数 imagesc() 或任何其他函数可视化矩阵。

这是我到目前为止编写的代码：

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table
iris_distance = table2array(iris_copy); % convert the table to an array

% pairwise similarity
D = pdist(iris_distance); % calculate the Euclidean distance and store the result in D
W = squareform(D); % convert to squareform
figure()
imagesc(W); % visualize the matrix

现在，我认为我的编码基本可以正确回答问题。我的问题是如何对所有样本进行排序，以便同一类别的样本一起出现，因为我在创建副本时去掉了名称。它是否已经通过转换为正方形进行了排序？其他建议？谢谢！

Answer 1

应该与原始数据顺序相同。虽然您可以在之后对其进行排序，但最简单的解决方案是在第 2 行之后和第 3 行之前按 class 对数据进行实际排序。

load('iris.mat'); % create a table of the data
iris.Properties.VariableNames = {'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width' 'Class'}; % change the variable names to their actual meaning
% Sort the table here on the "Class" attribute. Don't forget to change the table name
% in the next line too if you need to.
iris_copy = iris(1:150,{'Sepal_Length' 'Sepal_Width' 'Petal_Length' 'Petal_Width'}); % make a copy of the (numerical) features of the table

考虑使用排序：

tblB = sortrows(tblA,'RowNames') sorts a table based on its row names. Row names of a table label the rows along the first dimension of the table. If tblA does not have row names, that is, if tblA.Properties.RowNames is empty, then sortrows returns tblA.

成对相似性和样本排序

Pairwise Similarity and Sorting Samples

matlab

euclidean-distance

pairwise-distance