在 R 中使用 clusGAP 函数查找簇数

Find the number of clusters using clusGAP function in R

你能帮我找到使用 clusGap 函数的理想簇数吗?在这个link中有一个类似的例子:https://www.rdocumentation.org/packages/factoextra/versions/1.0.7/topics/fviz_nbclust

但我想为我的情况做这件事。我的代码如下:

library(cluster)

df <- structure(
list(Propertie = c(1,2,3,4,5,6,7,8), Latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
Longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
waste = c(526, 350, 526, 469, 285, 433, 456,825)),class = "data.frame", row.names = c(NA, -8L))

df<-scale(df)

hcluster = clusGap(df, FUN = hcut, K.max = 100, B = 50)
Clustering k = 1,2,..., K.max (= 100): .. Error in sil.obj[, 1:3] : incorrect number of dimensions

这里的问题是您将 K.max 指定为 100,但是,您的数据集中只有八个观测值。如 clusGap 文档中所述,K.max
要考虑的最大簇数,因此,在您的情况下,K.max 不能大于七个。

我不清楚聚类是否适用于如此小的数据集。尽管如此,请参阅下面的工作实施。我修改了 R/Bioconductor phyloseq 包中的 plot_clusgap 函数以可视化结果。

library(data.table)
library(cluster)
library(factoextra) # for hcut function

df <- data.table(properties = c(1,2,3,4,5,6,7,8),
                latitude = c(-24.779225, -24.789635, -24.763461, -24.794394, -24.747102,-24.781307,-24.761081,-24.761084),
                longitude = c(-49.934816, -49.922324, -49.911616, -49.906262, -49.890796,-49.8875254,-49.8875254,-49.922244),
                waste = c(526, 350, 526, 469, 285, 433, 456,825))

df <- scale(df)

# perform clustering, B = 500 is recommended
hcluster = clusGap(df, FUN = hcut, K.max = 7, B = 500)

# extract results
dat <- data.table(hcluster$Tab)
dat[, k := .I]

# visualize gap statistic
p <- ggplot(dat, aes(k, gap)) + geom_line() + geom_point(size = 3) +
  geom_errorbar(aes(ymax = gap + SE.sim, ymin = gap - SE.sim), width = 0.25) +
  ggtitle("Clustering Results") +
  labs(x = "Number of Clusters", y = "Gap Statistic") +
  theme(plot.title = element_text(size = 16, hjust = 0.5, face = "bold"),
        axis.title = element_text(size = 12, face = "bold"))

这是结果图:

我应该注意到所有的差距统计值都是负数。这表明最佳聚类数为 k = 1(即不聚类)。