一维数据的聚类

Question

我正在尝试在不使用内置 k-means 函数的情况下在 MATLAB 中学习 k-means 聚类算法。假设我有大小为 1x100 的数据，我想将它们分成两个集群。那我该怎么做呢。我想在 MATLAB 的绘图上一起可视化两个质心和数据。注意：当我在 MATLAB 中绘图时，我只能看到数据，但不能同时看到数据和两个质心。

非常感谢这方面的任何帮助。

Answer 1

matlab 中的最小 K 均值聚类算法可以是：

p = rand(100,2); % rand(number_of_points,number_of_dimension)
c = p(1:3,:);    % We create 3 centroids

% We run this minimal KNN algorithm:
for ii = 1:10
    % Which centroids is the closest for each points ? min(Euclidian_distance):
    [~,idx] = min(sum((permute(p,[3,2,1])-c).^2,2),[],1);
    % We calculate the new centroids (the center of mass of the corresponding points)
    c = splitapply(@mean,p,idx(:))
end

如果需要，我们可以绘制结果：

hold on
scatter(p(:,1),p(:,2),[],idx(:))
scatter(c(:,1),c(:,2),[],'red')

我们得到：

我们的 3 个质心为红色，簇具有不同的颜色。请注意，在此示例中，数据是维度 2，但它也适用于任何其他维度。

3个初始质心对应数据集的3个点（随机选择），确保每个质心至少是1个点的最接近质心。

在这个例子中有 10 次迭代。但是当质心收敛时，定义一个公差并停止迭代肯定更好。

一维数据的聚类

Clustering of 1 dimensional data

matlab

cluster-analysis

k-means