使用高斯混合模型对图像进行聚类

Question

我想使用 GMM（高斯混合模型对二值图像进行聚类，并且还想在二值图像本身上绘制聚类质心。

我用这个作为参考： http://in.mathworks.com/help/stats/gaussian-mixture-models.html

这是我的初始代码

 I=im2double(imread('sil10001.pbm'));
  K = I(:);
  mu=mean(K);
  sigma=std(K);
  P=normpdf(K, mu, sigma);
   Z = norminv(P,mu,sigma);
  X = mvnrnd(mu,sigma,1110);
  X=reshape(X,111,10);


 scatter(X(:,1),X(:,2),10,'ko');

options = statset('Display','final');
gm = fitgmdist(X,2,'Options',options);



idx = cluster(gm,X);
cluster1 = (idx == 1);
cluster2 = (idx == 2);


 scatter(X(cluster1,1),X(cluster1,2),10,'r+');
 hold on

  scatter(X(cluster2,1),X(cluster2,2),10,'bo');
  hold off
  legend('Cluster 1','Cluster 2','Location','NW')


  P = posterior(gm,X);

 scatter(X(cluster1,1),X(cluster1,2),10,P(cluster1,1),'+')
 hold on
 scatter(X(cluster2,1),X(cluster2,2),10,P(cluster2,1),'o')
 hold off
 legend('Cluster 1','Cluster 2','Location','NW')
 clrmap = jet(80); colormap(clrmap(9:72,:))
 ylabel(colorbar,'Component 1 Posterior Probability')

但问题是我无法在主二进制文件中绘制从 GMM 接收到的集群质心 image.How 我应该这样做吗？

**现在假设我在一个序列中有 10 张这样的图像，我想将它们的平均位置信息存储在两个单元格数组中，那么我该怎么做 that.This 是我的代码对我的新问题 * *

    images=load('gait2go.mat');%load the matrix file
    for i=1:10

   I{i}=images.result{i};
  I{i}=im2double(I{i});

   %determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
  Idims=size(I{i});
  whites=true(Idims(1),Idims(2));

    df=I{i};
      %we add up the various color channels
 for colori=1:size(df,3)
  whites=whites & df(:,:,colori)>0.5;
 end

%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);

%cluster data into 10 clumps
  K = 10;               % number of mixtures/clusters
  cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
   'maxiter',1000,'start','cluster');

%get clusterwise means
 meanx=zeros(K,1);
 meany=zeros(K,1);  
  for i=1:K
   meanx(i)=mean(datax(cInd==i));
   meany(i)=mean(datay(cInd==i));

 end

 xc{i}=meanx(i);%cell array contaning the position of the mean for the 10    
 images
  xb{i}=meany(i);

figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to      
 image
 axis equal;
  hold on;
  scatter(meany,-meanx,20,'+'); %same funky coordinates


 end

我能够分割 10 个图像，但没有存储在元胞数组 xc 和 xb.They r 中的 themean 值，仅存储 [] 代替 means

的值

Answer 1

我相信你一定在绘图中犯了一个天真的错误，这就是为什么你只看到一条直线：你只绘制了 x 值。

在我看来，分散命令中的第二个参数应该是 X(cluster1,2) 或 X(cluster2,2)，具体取决于代码中使用的 scatter 命令。

Answer 2

我决定 post 回答你的问题（你的问题是由 maximum-likelihood guess:P 决定的），但我写了一个广泛的介绍。请仔细阅读，因为我认为你很难理解你想要使用的方法，你也很难理解为什么别人不能用你通常的提问方式来帮助你。你的问题有几个问题，code-related 和概念上的。先说后者吧。

有问题的问题

你说你想用高斯混合模型对你的图像进行聚类。虽然我通常不熟悉集群，但在浏览了您的 reference and the wonderful SO answer you cited elsewhere (and a quick 101 from @rayryeng) 之后，我认为您完全走错了路。

顾名思义，高斯混合建模使用混合高斯（即正态）分布对数据集进行建模。这种方法流行的原因是，当你对各种量进行测量时，很多情况下你会发现你的数据大多呈正态分布（这其实也是它被称为的原因）正常)。这背后的原因是central limit theorem，这意味着合理独立的随机变量之和在许多情况下趋于正常

现在，聚类，另一方面，只是意味着根据某些标准将您的数据集分成不相交的较小束。主要标准通常是（某种）距离，因此您想在较大的数据集中找到 "close lumps of data"。在执行 GMM 之前，您通常需要对数据进行聚类，因为在不必猜测聚类的情况下，已经很难找到数据背后的高斯分布。如果让 GMM 算法处理原始数据，我对所涉及的过程还不够熟悉，无法判断它们的效果如何（但我希望许多实现无论如何都从聚类步骤开始）。

更接近你的问题：我猜你想做某种图像识别。看图片，你想得到更强相关的块。这是集群。如果你看一张动物园的照片，你会看到，比方说，一头大象和一条蛇。两者都有其独特的形状，并且彼此分开。如果将图像聚类（并且蛇没有骑象，neither did it eat it），您会发现两个块：一个块 elephant-shaped 和一个块 snake-shaped。现在，在这些数据集上使用 GMM 是没有意义的：大象，尤其是蛇，其形状不像多元高斯分布。但是，如果您只想知道照片中不同动物的位置，那么您一开始就不需要这个。

仍然使用示例，您应该确保将数据聚类到适当数量的子集中。如果您尝试将您的动物园图片聚类为 3 个聚类，您可能会得到第二条假蛇：大象的鼻子。随着集群数量的增加，您的分区可能越来越没有意义。

你的方法

你的代码没有给你任何合理的东西，这是有充分理由的：它从一开始就没有意义。看开头：

I=im2double(imread('sil10001.pbm'));
K = I(:);
mu=mean(K);
sigma=std(K);
X = mvnrnd(mu,sigma,1110);
X=reshape(X,111,10);

您读取二进制图像，将其转换为双精度图像，然后将其展开为一个向量并计算该向量的均值和偏差。您基本上将整个图像涂抹成 2 个值：平均强度和偏差。然后你用这些参数生成 111*10 个标准法线点，并尝试在前两组 111 上做 GMM。它们都是独立法线的，具有相同的参数。所以你可能会得到两个重叠的高斯分布，均值相同且偏差相同。

我认为您在网上找到的示例让您感到困惑。当您执行 GMM 时，您已经有了数据，因此不应涉及 pseudo-normal 个数字。但是，当人们 post 示例时，他们也会尝试提供可重现的输入（好吧，他们中的一些人会这样做，nudge nudge wink wink）。一个简单的方法是生成一个简单的高斯并集，然后可以将其输入 GMM。

所以，我的观点是，您不必生成随机数，但必须使用图像数据本身作为您程序的输入。并且您可能只想对您的图像进行聚类，而不是实际使用 GMM 在您的聚类上绘制土豆，因为您想要对关于人的图像中的 body 个部分进行聚类。大多数 body 部分不是形状像多元高斯（男性和女性有一些明显的例外）。

我认为你应该做什么

如果你真的想聚类你的图像，就像你添加到你的问题的图中，那么你应该使用像 k-means 这样的方法。但话又说回来，你已经有一个程序可以做到这一点，不是吗？所以我真的不认为我可以回答 "How can I cluster my image with GMM?" 这个问题。相反，这是 "How can I cluster my image?" 和 k-means 的答案，但在至少这里会有一段代码。

%set infile to what your image file will be
infile='sil10001.pbm';

%read file
I=im2double(imread(infile));

%determine 'white' pixels, size of image can be [M N], [M N 3] or [M N 4]
Idims=size(I);
whites=true(Idims(1),Idims(2));

%we add up the various color channels
for colori=1:Idims(3)
    whites=whites & I(:,:,colori)>0.5;
end

%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(whites);

%cluster data into 10 clumps
K = 10;               % number of mixtures/clusters
cInd = kmeans([datax datay], K, 'EmptyAction','singleton',...
    'maxiter',1000,'start','cluster');

%get clusterwise means
meanx=zeros(K,1);
meany=zeros(K,1);
for i=1:K
    meanx(i)=mean(datax(cInd==i));
    meany(i)=mean(datay(cInd==i));
end

figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to image
axis equal;
hold on;
scatter(meany,-meanx,20,'ko'); %same funky coordinates

这是它的作用。它首先像您一样读取您的图像。然后它尝试通过检查每个颜色通道（可以是 1、3 或 4）是否比 0.5 亮来确定 "white" 像素。然后，您指向聚类的输入数据将是白色像素的 x 和 y "coordinates"（即索引）。

接下来它通过 kmeans 进行聚类。这部分代码大致基于 the already cited answer of Amro。我必须设置较大的最大迭代次数，因为问题是 ill-posed，因为图片中没有 10 个清晰的簇。然后我们计算每个聚类的 mean，并用 gscatter 绘制聚类，用 scatter 绘制均值。请注意，为了使图片在 scatter 图中朝向正确的方向，您必须围绕输入坐标移动。或者，您可以在开头相应地定义 datax 和 datay。

这是我的输出，运行包含您在问题中提供的已处理数字：

Answer 3

代码可以更简单：

%read file

I=im2double(imread('sil10340.pbm'));
%choose indices of 'white' pixels as coordinates of data
[datax datay]=find(I);
%cluster data into 10 clumps
 K = 10;               % number of mixtures/clusters
[cInd, c] = kmeans([datax datay], K, 'EmptyAction','singleton',...
'maxiter',1000,'start','cluster');
 figure;
gscatter(datay,-datax,cInd); %funky coordinates for plotting according to    
image
axis equal;
hold on;
 scatter(c(:,2),-c(:,1),20,'ko'); %same funky coordinates

我认为不需要循环，因为 c 本身 return 一个包含方法位置的 10x2 双精度数组

使用高斯混合模型对图像进行聚类

Clustering an image using Gaussian mixture models

matlab

classification

cluster-analysis

machine-learning

mixture-model

有问题的问题

你的方法

我认为你应该做什么