打印简单 K 均值集群中实例的数量（计数）或百分比

Question

正在尝试以计数或百分比的形式获取每个集群中的实例数。我已经为下面的循环的简单 K 均值 WEKA 聚类的结果编写了聚类成员...

System.out.println("\n\nCluster membership:");
    for (int i = 0; i < m_instances.numInstances(); i++) { 
        try {
            int id = (int) m_instances.instance(i).index(i);
            temp.append("\nCluster " + clusterInstance(m_instances.instance(i)) + " contains Instance: " + id);

            } catch (Exception e) {
                e.printStackTrace();
                } 
    }

输出这样的结果...

Cluster 0 contains Instance: 0
Cluster 0 contains Instance: 1
Cluster 0 contains Instance: 2
Cluster 0 contains Instance: 3
Cluster 0 contains Instance: 4
Cluster 1 contains Instance: 5
Cluster 1 contains Instance: 6

...等等

有没有办法像我上面的代码一样使用 for 循环来获取每个集群内的实例数，以输出类似这样的内容...

Cluster 0 contains 5 Instances (71%)
Cluster 1 contains 2 Instances (28%)

Answer 1

您可以使用映射来跟踪每个集群的实例数量，然后计算每个集群的百分比：

Map<Integer, Integer> map = new HashMap<>();
int amountOfInstances = m_instances.numInstances();
for (int i = 0; i < amountOfInstances; i++) {
  try {
    // A merge to either add a new cluster with count=1,
    // or increase the count by 1 for an already existing cluster in the map
    map.merge(clusterInstance(m_instances.instance(i)), 1, Integer::sum);
  } catch (Exception e) {
    e.printStackTrace();
  }
}

for(Map.Entry<Integer, Integer> keyValuePair : map.entrySet()){
  int cluster = keyValuePair.getKey();
  int count = keyValuePair.getValue();
  int percentage = (int)(100d / amountOfInstances * count);
  System.out.println("Cluster " + cluster + " contains " + count + " Instances (" + percentage + "%)");
}

这将导致：

Cluster 0 contains 5 Instances (71%)
Cluster 1 contains 2 Instances (28%)

Try it online.

打印简单 K 均值集群中实例的数量（计数）或百分比

Printing number (count) or percentage of instances within Simple K Means clusters

java

weka

k-means