在 python 中,将期望值与实际值匹配的好方法是什么?
In python, what is a good way to match expected values to real values?
给定一个具有理想 x,y 位置的字典,我有一个接近理想位置的无序真实 x,y 位置列表,我需要将它们分类到相应的理想位置字典键。有时,对于给定位置,我根本得不到任何数据 (0,0)。
示例数据集是:
idealLoc= {1:(907,1026),
2:(892,1152),
3:(921,1364),
4:(969,1020),
5:(949,1220),
6:(951,1404),
'No_Data':(0,0)}
realLoc = [[ 892., 1152.],
[ 969., 1021.],
[ 906., 1026.],
[ 949., 1220.],
[ 951., 1404.],
[ 0., 0.]]
输出将是一个新字典,其中的真实位置分配给 idealLoc
中的正确字典键。我已经考虑过蛮力方法(为每个最佳匹配扫描整个列表 n 次),但我想知道是否有更多 elegant/efficient 方法?
编辑:下面是"brute"强制方法
Dest = {}
dp = 6
for (y,x) in realLoc:
for key, (r,c) in idealLoc.items():
if x > c-dp and x < c+dp and y > r-dp and y < r+dp:
Dest[key] = [y,x]
break
K-d trees are an efficient way to partition data in order to perform fast nearest-neighbour searches. You can use scipy.spatial.cKDTree
解决你的问题:
import numpy as np
from scipy.spatial import cKDTree
# convert inputs to numpy arrays
ilabels, ilocs = (np.array(vv) for vv in zip(*idealLoc.iteritems()))
rlocs = np.array(realLoc)
# construct a K-d tree that partitions the "ideal" points
tree = cKDTree(ilocs)
# query the tree with the real coordinates to find the nearest "ideal" neigbour
# for each "real" point
dist, idx = tree.query(rlocs, k=1)
# get the corresponding labels and coordinates
print(ilabels[idx])
# ['2' '4' '1' '5' '6' 'No_Data']
print(ilocs[idx])
# [[ 892 1152]
# [ 969 1020]
# [ 907 1026]
# [ 949 1220]
# [ 951 1404]
# [ 0 0]]
默认情况下 cKDTree
使用欧几里德范数作为距离度量,但您也可以通过将 p=
关键字参数传递给 tree.query()
来指定曼哈顿范数、最大范数等.
还有 scipy.interpolate.NearestNDInterpolator
class,它基本上只是 scipy.spatial.cKDTree
.
的便利包装
假设你想使用欧几里得距离,你可以使用scipy.spatial.distance.cdist
计算距离矩阵,然后选择最近的点。
import numpy
from scipy.spatial import distance
ideal = numpy.array(idealloc.values())
real = numpy.array(realloc)
dist = distance.cdist(ideal, real)
nearest_indexes = dist.argmin(axis=0)
给定一个具有理想 x,y 位置的字典,我有一个接近理想位置的无序真实 x,y 位置列表,我需要将它们分类到相应的理想位置字典键。有时,对于给定位置,我根本得不到任何数据 (0,0)。 示例数据集是:
idealLoc= {1:(907,1026),
2:(892,1152),
3:(921,1364),
4:(969,1020),
5:(949,1220),
6:(951,1404),
'No_Data':(0,0)}
realLoc = [[ 892., 1152.],
[ 969., 1021.],
[ 906., 1026.],
[ 949., 1220.],
[ 951., 1404.],
[ 0., 0.]]
输出将是一个新字典,其中的真实位置分配给 idealLoc
中的正确字典键。我已经考虑过蛮力方法(为每个最佳匹配扫描整个列表 n 次),但我想知道是否有更多 elegant/efficient 方法?
编辑:下面是"brute"强制方法
Dest = {}
dp = 6
for (y,x) in realLoc:
for key, (r,c) in idealLoc.items():
if x > c-dp and x < c+dp and y > r-dp and y < r+dp:
Dest[key] = [y,x]
break
K-d trees are an efficient way to partition data in order to perform fast nearest-neighbour searches. You can use scipy.spatial.cKDTree
解决你的问题:
import numpy as np
from scipy.spatial import cKDTree
# convert inputs to numpy arrays
ilabels, ilocs = (np.array(vv) for vv in zip(*idealLoc.iteritems()))
rlocs = np.array(realLoc)
# construct a K-d tree that partitions the "ideal" points
tree = cKDTree(ilocs)
# query the tree with the real coordinates to find the nearest "ideal" neigbour
# for each "real" point
dist, idx = tree.query(rlocs, k=1)
# get the corresponding labels and coordinates
print(ilabels[idx])
# ['2' '4' '1' '5' '6' 'No_Data']
print(ilocs[idx])
# [[ 892 1152]
# [ 969 1020]
# [ 907 1026]
# [ 949 1220]
# [ 951 1404]
# [ 0 0]]
默认情况下 cKDTree
使用欧几里德范数作为距离度量,但您也可以通过将 p=
关键字参数传递给 tree.query()
来指定曼哈顿范数、最大范数等.
还有 scipy.interpolate.NearestNDInterpolator
class,它基本上只是 scipy.spatial.cKDTree
.
假设你想使用欧几里得距离,你可以使用scipy.spatial.distance.cdist
计算距离矩阵,然后选择最近的点。
import numpy
from scipy.spatial import distance
ideal = numpy.array(idealloc.values())
real = numpy.array(realloc)
dist = distance.cdist(ideal, real)
nearest_indexes = dist.argmin(axis=0)