意外的聚类错误(围绕 mediods 分区)
Unexpected clustering errors (partitioning around mediods)
我正在使用 fpc
包来确定最佳簇数。 pamk()
函数将相异矩阵作为参数,不需要用户指定 k
。根据 documentation:
pamk() This calls pam and clara for the partitioning around medoids
clustering method (Kaufman and Rouseeuw, 1990) and includes two
different ways of estimating the number of clusters.
但是当我输入两个非常相似的矩阵 - foo
和 bar
(下面的数据)时,函数在第二个矩阵 (bar)
上出错
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
鉴于输入矩阵基本相同,可能导致此错误的原因是什么?例如:
foo 有效!
hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2
酒吧没有
hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
任何建议都会有所帮助!
dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares",
"etc", "etc", "etc", "etc", "etc", "similares"), NULL))
dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0,
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez",
"similares", "similares", "similares", "similares"), NULL))
bar
有 n=5
列,因此 max(krange)
必须 <= n-1,因此是 4。默认的 krange 是 2:10,因此出现错误.您可能必须通过适当的 krange
;尝试:
pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))
我正在使用 fpc
包来确定最佳簇数。 pamk()
函数将相异矩阵作为参数,不需要用户指定 k
。根据 documentation:
pamk() This calls pam and clara for the partitioning around medoids clustering method (Kaufman and Rouseeuw, 1990) and includes two different ways of estimating the number of clusters.
但是当我输入两个非常相似的矩阵 - foo
和 bar
(下面的数据)时,函数在第二个矩阵 (bar)
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
鉴于输入矩阵基本相同,可能导致此错误的原因是什么?例如:
foo 有效!
hc <- hclust(as.dist(foo))
plot(hc)
pamk.best <- fpc::pamk(foo)
pamk.best$nc
[1] 2
酒吧没有
hc <- hclust(as.dist(bar))
plot(hc, main = 'bar dendogram')
pamk.best <- fpc::pamk(bar)
Error in pam(sdata, k, diss = diss, ...) :
Number of clusters 'k' must be in {1,2, .., n-1}; hence n >= 2
任何建议都会有所帮助!
dput(foo)
structure(c(0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0,
0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0,
0, 0, 0, 9, 0, 0, 0, 0, 0, 0, 0, 9, 0, 0, 0, 0, 0, 9, 9, 9, 9,
9, 9, 9, 9, 0, 9, 9, 9, 9, 9, 0), .Dim = c(14L, 14L), .Dimnames = list(
c("etc", "etc", "etc", "etc", "etc", "etc", "etc", "similares",
"etc", "etc", "etc", "etc", "etc", "similares"), NULL))
dput(bar)
structure(c(0, 6, 6, 6, 6, 6, 0, 0, 0, 0, 6, 0, 0, 0, 0, 6, 0,
0, 0, 0, 6, 0, 0, 0, 0), .Dim = c(5L, 5L), .Dimnames = list(c("ramírez",
"similares", "similares", "similares", "similares"), NULL))
bar
有 n=5
列,因此 max(krange)
必须 <= n-1,因此是 4。默认的 krange 是 2:10,因此出现错误.您可能必须通过适当的 krange
;尝试:
pamk.best <- fpc::pamk(bar, krange=c(2:(dim(bar)[2]-1)))