计算质心和精度
Compute centroid and accuracy
我有两个点 feat_left, feat_right
从暹罗网络获得,我将这些点绘制在 x,y
坐标中,如下所示。
这是python脚本
import json
import matplotlib.pyplot as plt
import numpy as np
data = json.load(open('predictions-mnist.txt'))
n=len(data['outputs'].items())
label_list = np.array(range(n))
feat_left = np.random.random((n,2))
count=1
for key,val in data['outputs'].items():
feat = data['outputs'][key]['feat_left']
feat_left[count-1] = feat
key = key.split("/")
key = int(key[6])
label_list[count - 1] = key
count = count + 1
f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff',
'#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
plt.plot(feat_left[label_list==i,0].flatten(), feat_left[label_list==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()
现在我想计算每个簇的 centriod 然后 purity。
质心就是 mean
:
centorids = np.zeros((10,2), dtype='f4')
for i in xrange(10):
centroids[i,:] = np.mean( feat_left[label_list==i, :2], axis=0 )
至于准确度,您可以计算距质心的均方误差(距离):
sqerr = np.zeros((10,), dtype='f4')
for i in xrange(10):
sqerr[i] = np.sum( (feat_left[label_list==i, :2]-centroids[i,:])**2 )
计算 purity:
def compute_cluster_purity(gt_labels, pred_labels):
"""
Compute purity of predicted labels (pred_labels), given
the ground-truth labels (gt_labels).
Assuming gt_labels and pred_labels are both lists of int of length n
"""
n = len(gt_labels) # number of elements
assert len(pred_labels) == n
purity = 0
for l in set(pred_labels):
# for predicted label l, what are the gt_labels of this cluster?
gt = [gt_labels[i] for i, il in enumerate(pred_labels) if il==l]
# most frequent gt label in this cluster:
mfgt = max(set(gt), key=gt.count)
purity += gt.count(mfgt) # count intersection between most frequent ground truth and this cluster
return float(purity)/n
有关在群集中选择最频繁标签的更多详细信息,请参阅 this answer。
我有两个点 feat_left, feat_right
从暹罗网络获得,我将这些点绘制在 x,y
坐标中,如下所示。
这是python脚本
import json
import matplotlib.pyplot as plt
import numpy as np
data = json.load(open('predictions-mnist.txt'))
n=len(data['outputs'].items())
label_list = np.array(range(n))
feat_left = np.random.random((n,2))
count=1
for key,val in data['outputs'].items():
feat = data['outputs'][key]['feat_left']
feat_left[count-1] = feat
key = key.split("/")
key = int(key[6])
label_list[count - 1] = key
count = count + 1
f = plt.figure(figsize=(16,9))
c = ['#ff0000', '#ffff00', '#00ff00', '#00ffff', '#0000ff',
'#ff00ff', '#990000', '#999900', '#009900', '#009999']
for i in range(10):
plt.plot(feat_left[label_list==i,0].flatten(), feat_left[label_list==i,1].flatten(), '.', c=c[i])
plt.legend(['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'])
plt.grid()
plt.show()
现在我想计算每个簇的 centriod 然后 purity。
质心就是 mean
:
centorids = np.zeros((10,2), dtype='f4')
for i in xrange(10):
centroids[i,:] = np.mean( feat_left[label_list==i, :2], axis=0 )
至于准确度,您可以计算距质心的均方误差(距离):
sqerr = np.zeros((10,), dtype='f4')
for i in xrange(10):
sqerr[i] = np.sum( (feat_left[label_list==i, :2]-centroids[i,:])**2 )
计算 purity:
def compute_cluster_purity(gt_labels, pred_labels):
"""
Compute purity of predicted labels (pred_labels), given
the ground-truth labels (gt_labels).
Assuming gt_labels and pred_labels are both lists of int of length n
"""
n = len(gt_labels) # number of elements
assert len(pred_labels) == n
purity = 0
for l in set(pred_labels):
# for predicted label l, what are the gt_labels of this cluster?
gt = [gt_labels[i] for i, il in enumerate(pred_labels) if il==l]
# most frequent gt label in this cluster:
mfgt = max(set(gt), key=gt.count)
purity += gt.count(mfgt) # count intersection between most frequent ground truth and this cluster
return float(purity)/n
有关在群集中选择最频繁标签的更多详细信息,请参阅 this answer。