聚类数据并找到聚类的最小值和最大值
Clustering data and finding minimum and maximum value of a cluster
我有一个包含长二维数组的文本文件。每个的第一个元素的数字介于 1 到 6 之间。
我想聚类线条。考虑到1-6范围内的每个元素都有两个簇,如何为这个数据确定一个簇的最小值和最大值,这里是0到6的范围?
查看蓝色集群,我想确定每个集群的最小值和最大值作为集群的边界。哪种算法可以解决这个问题?我需要为这 6 行的所有集群找到最小值-最大值。
您应该使用 kmeans 进行聚类,并使用一些字典映射来获取 min/max 值:
代码:
import numpy as np
import numpy as np
from scipy.cluster.vq import kmeans, vq
from collections import defaultdict
dd = defaultdict(list)
arr = [[1, 2], [3,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [4,906], [2,5000], [2,358], [4,69], [3,89], [4, 258],[2, 632], [4, 585], [3, 47]]
for k in arr:
dd[k[0]].append(k[1]) #creating dictionary containing first element of arr as key and last element as value
dd = dict(dd)
在试图理解下面的代码之前,先看看here
"""
This below code creates new dict based on the previous dict data
The dict keys have 2 lists as values, containing min/max value for each cluster
"""
new_dd = defaultdict(list)
for k, v in dd.items():
codebook, _ = kmeans(np.array(v, dtype=float), 2) # 2 clusters
cluster_indices, _ = vq(v, codebook) #creates indices of cluster for each element
#defining 2 clusters
zero_cluster= []
one_cluster = []
for i, val in enumerate(cluster_indices):
if val == 0:
zero_cluster.append(v[i])
else:
one_cluster.append(v[i])
min_zero=0
max_zero=0
min_one=0
max_one=0
if len(zero_cluster)>0:
min_zero = min(zero_cluster)
max_zero = max(zero_cluster)
if len(one_cluster)>0:
min_one = min(one_cluster)
max_one = max(one_cluster)
#adding stats to the new dict based on cluster
new_dd[k].append([[min_one, max_one],[min_zero, max_zero]])
new_dd = dict(new_dd)
new_dd = {k:v[0] for k,v in new_dd.items()}
print(new_dd)
我有一个包含长二维数组的文本文件。每个的第一个元素的数字介于 1 到 6 之间。
我想聚类线条。考虑到1-6范围内的每个元素都有两个簇,如何为这个数据确定一个簇的最小值和最大值,这里是0到6的范围?
查看蓝色集群,我想确定每个集群的最小值和最大值作为集群的边界。哪种算法可以解决这个问题?我需要为这 6 行的所有集群找到最小值-最大值。
您应该使用 kmeans 进行聚类,并使用一些字典映射来获取 min/max 值:
代码:
import numpy as np
import numpy as np
from scipy.cluster.vq import kmeans, vq
from collections import defaultdict
dd = defaultdict(list)
arr = [[1, 2], [3,585], [2, 0], [1, 500], [2, 668], [3, 54], [4, 28], [3, 28], [4,163], [3,85], [4,906], [2,5000], [2,358], [4,69], [3,89], [4, 258],[2, 632], [4, 585], [3, 47]]
for k in arr:
dd[k[0]].append(k[1]) #creating dictionary containing first element of arr as key and last element as value
dd = dict(dd)
在试图理解下面的代码之前,先看看here
"""
This below code creates new dict based on the previous dict data
The dict keys have 2 lists as values, containing min/max value for each cluster
"""
new_dd = defaultdict(list)
for k, v in dd.items():
codebook, _ = kmeans(np.array(v, dtype=float), 2) # 2 clusters
cluster_indices, _ = vq(v, codebook) #creates indices of cluster for each element
#defining 2 clusters
zero_cluster= []
one_cluster = []
for i, val in enumerate(cluster_indices):
if val == 0:
zero_cluster.append(v[i])
else:
one_cluster.append(v[i])
min_zero=0
max_zero=0
min_one=0
max_one=0
if len(zero_cluster)>0:
min_zero = min(zero_cluster)
max_zero = max(zero_cluster)
if len(one_cluster)>0:
min_one = min(one_cluster)
max_one = max(one_cluster)
#adding stats to the new dict based on cluster
new_dd[k].append([[min_one, max_one],[min_zero, max_zero]])
new_dd = dict(new_dd)
new_dd = {k:v[0] for k,v in new_dd.items()}
print(new_dd)