评估聚类性能

Question

我的原始数据如下：

df =

        long lat long lat long lat long lat 

    1   11   6   15   19  23   27  30   34
    2   12   7   16   20  24   28  31   35
    3   13   8   17   21  25   29  32   36
    ...
    96  14   9   18   22  26   30  33   37

其中：1,2,3,..,96 列是“taxi_id”。这意味着我们有 96 辆车。

其他列表示汽车的位置，假设它们是一对。

示例：标签为 1 的出租车的位置为 (11,6)(15,19)(23,27)(30,34)

所以，我需要将它们聚类以查看最常用的轨迹被这些出租车司机。

为此，我计算了 "some" 距离矩阵，然后计算了它的相似度矩阵并将最终矩阵应用于 Affinity Propagation

亲和传播代码：

from sklearn.cluster import AffinityPropagation

af = AffinityPropagation(preference=-6).fit(X)
cluster_centers_indices = af.cluster_centers_indices_
labels = af.labels_ 

# Some code to calculate number of clusters (3 in this case)
# Some code to check which "taxi_id" related to clusters

最终数据如下：

final_df = 

               long    lat
        1      11      22
    0   2      33      44
        3      55      66
        ...    ...     ...
        45     12      13
    2   46     14      15
        47     16      17

我想评估我的集群。我不知道如何。我没有预测任何东西，那么如何使用 sklearn 评估指标？我什至找不到逻辑（究竟要评估什么）？也许两个集群之间的距离（CD）？您有任何想法或解决方案代码如何进行吗？

Answer 1

I can not even find a logic (what exactly to evaluate)? Maybe Distance between two clusters (CD)?

你是对的，一种方法是测量集群中所有集群点之间的距离。这个想法是针对不同数量的集群对其进行测试，在您的情况下，oyu 只有 3 个集群 (0-2)。

例如剪影得分就是其中一种技术。

https://en.wikipedia.org/wiki/Silhouette_(clustering)

Do you have any ideas or solution code how to proceed?

这里有很多关于Whosebug的解决方案：

另一种方法可能适合您：

他们试图回答所有这些方法的问题：我应该选择多少个集群？如果您预先知道要拥有的集群数量，这可以帮助您判断集群的风险和质量。

Answer 2

clusteval 库可能有用。该库包含五种可用于评估聚类的方法； silhouette、dbindex、derivative、*dbscan *和hdbscan.

pip install clusteval

我建议您使用 dbscan：

# Import library
from clusteval import clusteval

# Set parameters
ce = clusteval(method='dbscan')

# Fit to find optimal number of clusters using dbscan
out = ce.fit(df.values)

# Make plot of the cluster evaluation
ce.plot()

# Make scatter plot. Note that the first two coordinates are used for plotting.
ce.scatter(df.values)

评估聚类性能

Evaluate clustering performance

python

evaluation

cluster-computing