意外的聚类错误(围绕 mediods 分区)

Unexpected clustering errors (partitioning around mediods)

我正在使用 fpc 包来确定最佳簇数。 pamk() 函数将相异矩阵作为参数,不需要用户指定 k。根据 documentation:

pamk() This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.

但是当我输入两个非常相似的矩阵 - foobar(下面的数据)时,函数在第二个矩阵 (bar)

上出错
Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2 

鉴于输入矩阵基本相同,可能导致此错误的原因是什么?例如:

foo 有效!

hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2

酒吧没有

hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) : 
  Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2

任何建议都会有所帮助!

dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9, 
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
    c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares", 
    "etc", "etc", "etc", "etc", "etc", "similares"), NULL))

dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0, 
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez", 
"similares", "similares", "similares", "similares"), NULL))

barn=5 列,因此 max(krange) 必须 <= n-1,因此是 4。默认的 krange 是 2:10,因此出现错误.您可能必须通过适当的 krange;尝试:

pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))