意外的聚类错误（围绕 mediods 分区）

Question

我正在使用 fpc 包来确定最佳簇数。 pamk() 函数将相异矩阵作为参数，不需要用户指定 k。根据 documentation:

pamk() This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.

但是当我输入两个非常相似的矩阵 - foo 和 bar（下面的数据）时，函数在第二个矩阵 (bar)

上出错

Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2

鉴于输入矩阵基本相同，可能导致此错误的原因是什么？例如：

foo 有效！

hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2

酒吧没有

hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2

任何建议都会有所帮助！

dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
    c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares", 
    "etc", "etc", "etc", "etc", "etc", "similares"), NULL))

dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0, 
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez", 
"similares", "similares", "similares", "similares"), NULL))

Answer 1

bar 有 n=5 列，因此 max(krange) 必须 <= n-1，因此是 4。默认的 krange 是 2:10，因此出现错误.您可能必须通过适当的 krange;尝试：

pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))

意外的聚类错误（围绕 mediods 分区）

Unexpected clustering errors (partitioning around mediods)

nlp

r

cluster-analysis

k-means

unsupervised-learning