在 python 中，将期望值与实际值匹配的好方法是什么？

Question

给定一个具有理想 x,y 位置的字典，我有一个接近理想位置的无序真实 x,y 位置列表，我需要将它们分类到相应的理想位置字典键。有时，对于给定位置，我根本得不到任何数据 (0,0)。示例数据集是：

idealLoc= {1:(907,1026),
           2:(892,1152),
           3:(921,1364),
           4:(969,1020),
           5:(949,1220),
           6:(951,1404),
   'No_Data':(0,0)}

realLoc = [[  892.,  1152.],
           [  969.,  1021.],
           [  906.,  1026.],
           [  949.,  1220.],
           [  951.,  1404.],
           [    0.,     0.]]

输出将是一个新字典，其中的真实位置分配给 idealLoc 中的正确字典键。我已经考虑过蛮力方法（为每个最佳匹配扫描整个列表 n 次），但我想知道是否有更多 elegant/efficient 方法？

编辑：下面是"brute"强制方法

Dest = {}
dp = 6
for (y,x) in realLoc:
    for key, (r,c) in idealLoc.items():   
        if x > c-dp and x < c+dp and y > r-dp and y < r+dp:
            Dest[key] = [y,x]
            break

Answer 1

K-d trees are an efficient way to partition data in order to perform fast nearest-neighbour searches. You can use scipy.spatial.cKDTree 解决你的问题：

import numpy as np
from scipy.spatial import cKDTree

# convert inputs to numpy arrays
ilabels, ilocs = (np.array(vv) for vv in zip(*idealLoc.iteritems()))
rlocs = np.array(realLoc)

# construct a K-d tree that partitions the "ideal" points
tree = cKDTree(ilocs)

# query the tree with the real coordinates to find the nearest "ideal" neigbour
# for each "real" point
dist, idx = tree.query(rlocs, k=1)

# get the corresponding labels and coordinates
print(ilabels[idx])
# ['2' '4' '1' '5' '6' 'No_Data']

print(ilocs[idx])
# [[ 892 1152]
#  [ 969 1020]
#  [ 907 1026]
#  [ 949 1220]
#  [ 951 1404]
#  [   0    0]]

默认情况下 cKDTree 使用欧几里德范数作为距离度量，但您也可以通过将 p= 关键字参数传递给 tree.query() 来指定曼哈顿范数、最大范数等.

还有 scipy.interpolate.NearestNDInterpolator class，它基本上只是 scipy.spatial.cKDTree.

的便利包装

Answer 2

假设你想使用欧几里得距离，你可以使用scipy.spatial.distance.cdist计算距离矩阵，然后选择最近的点。

import numpy
from scipy.spatial import distance

ideal = numpy.array(idealloc.values())
real = numpy.array(realloc)

dist = distance.cdist(ideal, real)

nearest_indexes = dist.argmin(axis=0)

在 python 中，将期望值与实际值匹配的好方法是什么？

In python, what is a good way to match expected values to real values?

python

numpy

classification