MATLAB 中的层次聚类
Hierarchical Clustering in MATLAB
我按以下方式使用层次聚类对数据 X 进行了聚类:
X = [1 1 1;
2 2 2;
1 1 0;
1 2 2];
Y = pdist(X);
T = linkage(Y, 'complete');
c = cluster(T,'maxclust',2);
So, X(1,:) and X(3,:) belongs to cluster #1 and others belongs to
cluster #2.
如何确定应将新数据点(不在 X 中)分配给哪个集群?例如 [1 0 1] 属于哪个集群?
简单的解决方案是找到最近的簇质心。
最近的质心
x_new = [1 0 1];
% Find cluster centroid
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
X_c(cid,:) = mean(X(c == cid,:));
end
% Find closest centroid
[~,c_new] = min(pdist2(x_new,X_c));
如果您有更多样本并且想要考虑方差因素,您可以计算欧氏距离的 z 分数
距离的 Z 分数
x_new = [1 0 1];
X_means = zeros(1,numel(unique(c)));
X_stds = zeros(1,numel(unique(c)));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
distances = pdist2(X(c == cid,:),mean(X(c == cid,:)));
X_means(cid) = mean(distances);
X_stds(cid) = std(distances);
X_c(cid,:) = mean(X(c == cid,:));
end
[~,c_new] = min((pdist2(x_new,X_c) - X_means)./X_stds);
如果你想考虑分量方差,你可以采用分量距离的 Z 分数(我不确定这个结果与上面的结果有什么不同......)
分量距离的平均 Z 分数
x_new = [1 0 1];
X_means = zeros(numel(unique(c)),size(X,2));
X_stds = zeros(numel(unique(c)),size(X,2));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
comp_distances = abs(X(c == cid,:) - repmat(mean(X(c == cid,:)),[numel(find(c==cid)),1]));
X_means(cid,:) = mean(comp_distances);
X_stds(cid,:) = std(comp_distances);
X_c(cid,:) = mean(X(c == cid,:));
end
[~,c_new] = min(mean(((repmat(x_new,[size(X_c,1),1])-X_c) - X_means)./X_stds,2));
我按以下方式使用层次聚类对数据 X 进行了聚类:
X = [1 1 1;
2 2 2;
1 1 0;
1 2 2];
Y = pdist(X);
T = linkage(Y, 'complete');
c = cluster(T,'maxclust',2);
So, X(1,:) and X(3,:) belongs to cluster #1 and others belongs to cluster #2.
如何确定应将新数据点(不在 X 中)分配给哪个集群?例如 [1 0 1] 属于哪个集群?
简单的解决方案是找到最近的簇质心。
最近的质心
x_new = [1 0 1];
% Find cluster centroid
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
X_c(cid,:) = mean(X(c == cid,:));
end
% Find closest centroid
[~,c_new] = min(pdist2(x_new,X_c));
如果您有更多样本并且想要考虑方差因素,您可以计算欧氏距离的 z 分数
距离的 Z 分数
x_new = [1 0 1];
X_means = zeros(1,numel(unique(c)));
X_stds = zeros(1,numel(unique(c)));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
distances = pdist2(X(c == cid,:),mean(X(c == cid,:)));
X_means(cid) = mean(distances);
X_stds(cid) = std(distances);
X_c(cid,:) = mean(X(c == cid,:));
end
[~,c_new] = min((pdist2(x_new,X_c) - X_means)./X_stds);
如果你想考虑分量方差,你可以采用分量距离的 Z 分数(我不确定这个结果与上面的结果有什么不同......)
分量距离的平均 Z 分数
x_new = [1 0 1];
X_means = zeros(numel(unique(c)),size(X,2));
X_stds = zeros(numel(unique(c)),size(X,2));
X_c = zeros(numel(unique(c)), size(X,2));
for cid = unique(c)'
comp_distances = abs(X(c == cid,:) - repmat(mean(X(c == cid,:)),[numel(find(c==cid)),1]));
X_means(cid,:) = mean(comp_distances);
X_stds(cid,:) = std(comp_distances);
X_c(cid,:) = mean(X(c == cid,:));
end
[~,c_new] = min(mean(((repmat(x_new,[size(X_c,1),1])-X_c) - X_means)./X_stds,2));