将每个训练图像编码为每个词汇元素出现在 Bag of Visual Words 中的次数的直方图

Question

我想在 MATLAB 中实现视觉词袋。我使用 SURF 特征从图像中提取特征，并使用 k-means 将这些特征聚类到 k 个簇中。我现在有 k 个质心，我想通过将每个图像特征分配给它的近邻来了解每个簇被使用了多少次。最后，我想为每个图像创建一个直方图。

我尝试使用 knnsearch 函数，但在这种情况下不起作用。

这是我的 MATLAB 代码：

clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name}; 
for k=1:numel(files)
    fullFileName = fullfile(folder, files{k});
    H = fspecial('log');
    image=imfilter(imread(fullFileName),H);
    temp =  detectSURFFeatures(image);
    [im_features, temp] = extractFeatures(image, temp);
    features{k}= im_features;

end

features = vertcat(features{:});
image_feats = [];
[assignments,centers] = kmeans(double(features),500);
vocab = centers';

我在特征数组中有所有图像特征，在质心数组中有聚类中心

Answer 1

你快到了。您甚至根本不需要使用 knnsearch。 assignments 变量告诉您哪个输入特征映射到哪个集群。 assignments 会给你一个 N x 1 向量，其中 N 是你拥有的样本总数，或者输入矩阵 features 中的特征总数。每个值 assignments(i) 告诉您它映射到 features 的示例 i（或行 i）的哪个集群。 assignments(i) 指定的簇质心将给出 centers(i, :)。因此，鉴于您如何调用 kmeans，它将是一个 N x 1 向量，其中每个元素从 1 到 500，其中 500 是所需的簇总数。

让我们做一个简单的例子，在你的密码本中只有一张图片。如果是这种情况，你所要做的就是创建一个直方图 assignments 变量。输出直方图 h 将是一个 500 x 1 向量，每个元素 h(i) 是示例使用质心 i 作为其在代码簿中的表示的次数。

只需使用 histcounts 函数并确保指定 bin 范围，以便它们与每个集群 ID 一致。您必须确保考虑到结尾的 bin，因为 bin 范围在右边缘是独占的，所以只需在末尾添加一个额外的 bin。

像这样的东西会起作用：

h = histcounts(assignments, 1 : 501);

如果您想要更简单的东西并且不想担心指定结束 bin，您可以使用 accumarray 来获得相同的结果：

h = accumarray(assignments, 1);

accumarray 的效果我们分配键值对，其中键是示例映射到的质心，所有键的值都是 1。 accumarray 将 assignments 中共享相同键的所有值合并，您可以对这些值进行操作。 accumarray 的默认行为是对所有值求和，这实际上是计算直方图。

但是，您想对多张图片执行此操作，而不仅仅是一张图片。对于 Bag of Visual Words 问题，我们的数据库中肯定会有不止一张训练图像。因此，您想找到 每张图像 的特征直方图。我们仍然可以使用上面的概念，但我可以建议的一件事是你维护一个单独的变量，告诉你每个图像检测到多少个特征，然后你可以索引到 assignments 变量以帮助提取正确分配的质心 ID，然后单独构建这些直方图。我们可以构建一个二维矩阵，其中每一行描绘每个图像的直方图。请记住，在 kmeans 中，每一行都会告诉您每个示例独立于数据中的其他示例被分配到哪个集群。使用它，您将在整个训练数据集上使用 kmeans，然后明智地了解如何访问 assignments 变量以提取每个输入图像的分配集群。

因此，修改您的代码，使其看起来像这样：

clc;
clear;
close all;
folder = 'CarData/TrainImages/cars';
filePattern = fullfile(folder, '*.pgm');
f=dir(filePattern);
files={f.name}; 
num_features = zeros(numel(files), 1); % New - for keeping track of # of features per image
for k=1:numel(files)
    fullFileName = fullfile(folder, files{k});
    H = fspecial('log');
    image=imfilter(imread(fullFileName),H);
    temp =  detectSURFFeatures(image);
    [im_features, temp] = extractFeatures(image, temp);
    num_features(k) = size(im_features, 1); % New - # of features per image
    features{k}= im_features;    
end

features = vertcat(features{:});
num_clusters = 500; % Added to make the code adaptive
[assignments,centers] = kmeans(double(features), num_clusters);

counter = 1; % Keeps track of where we need to slice in assignments

% Go through each image and find their histograms
features_hist = zeros(numel(files), num_clusters); % Records the per image histograms
for k = 1 : numel(files)
    a = assignments(counter : counter + num_features(k) - 1); % Get the assignments
    h = histcounts(a, 1 : num_clusters + 1);
    % Or:
    % h = accumarray(a, 1).'; % Transpose to make it a row

    % Place in final output
    features_hist(k, :) = h;

    % Increment counter 
    counter = counter + num_features(k);
end

features_hist 现在将是一个 N x 500 矩阵，其中每一行都是您要查找的每个图像的直方图。最后的工作是使用监督机器学习算法（SVM、神经网络等），其中预期标签是您分配给图像的每个图像的描述，并附有每个图像的直方图作为输入特征。最终结果将是一个学习模型，这样当你有一个新图像时，计算 SURF 特征，像我们上面那样用特征直方图表示它们，然后将它输入 classification 模型，给你预期 class 或图像代表的标签。

P.S。深度学习/CNN 在这方面做得更好，但需要更多的时间来训练。如果您正在考虑性能方面的问题，请不要使用 Bag of Visual Words，但这实现起来非常快，而且众所周知它的性能还不错，但这当然取决于您想要的图像类型 class证明。

将每个训练图像编码为每个词汇元素出现在 Bag of Visual Words 中的次数的直方图

Encode each training image as a histogram of the number of times each vocabulary element shows up for Bag of Visual Words

matlab

image-processing

histogram

surf

knn