k 均值聚类 - 惯性只会变大

k-mean clustering - inertia only gets larger

我正在尝试使用来自 faiss on a human pose dataset of body joints. I have 16 body parts so a dimension of 32. The joints are scaled in a range between 0 and 1. My dataset consists of ~ 900.000 instances. As mentioned by faiss (faiss_FAQ 的 KMeans 聚类）：

As a rule of thumb there is no consistent improvement of the k-means quantizer beyond 20 iterations and 1000 * k training points

将此应用于我的问题，我随机 select 50000 个实例进行训练。因为我想检查 1 到 30 之间的簇数 k。

现在说说我的“问题”：

随着集群数量的增加，惯性直接增加（x轴上的n_cluster）：

我尝试改变迭代次数、重做次数、冗长和球形，但结果保持不变或变得更糟。我不认为这是我实施的问题；我在一个带有 2D 数据和非常清晰集群的小示例上对其进行了测试，并且它有效。

是数据聚类不好还是我错过了另一个 problem/mistake？也许在 0 和 1 之间缩放值？我应该尝试另一种方法吗？

我发现了我的错误。我不得不增加参数 max_points_per_centroid。因为我有这么多数据点，所以它采样了一个子批次以进行拟合。对于更多的集群，这个子批次更大。查看faiss常见问题解答：

max_points_per_centroid * k: there are too many points, making k-means unnecessarily slow. Then the training set is sampled

子批次越大，惯性越大，总点数越多

k 均值聚类 - 惯性只会变大

k-mean clustering - inertia only gets larger

k-means

feature-clustering

faiss