在 KD-Tree 中读取 Z 维度

Question

几个月来，我一直在研究如何最好地编写一个程序来分析多个表格的地理坐标相似性。我现在已经尝试了从嵌套 for 循环到目前使用 KD-Tree 的所有方法，它似乎工作得很好。但是我不确定它在我的第 3 个维度中读取时是否正常运行，在这种情况下被定义为 Z。

import numpy
from scipy import spatial
import math as ma

def d(a,b):
d = ma.acos(ma.sin(ma.radians(a[1]))*ma.sin(ma.radians(b[1]))
            +ma.cos(ma.radians(a[1]))*ma.cos(ma.radians(b[1]))*(ma.cos(ma.radians((a[0]-b[0])))))
return d

filename1 = "A"
pos1 = numpy.genfromtxt(filename1,
                 skip_header=1,
                 usecols=(1, 2))
z1 = numpy.genfromtxt(filename1,
                 skip_header=1,
                 usecols=(3))
filename2 = "B"
pos2 = numpy.genfromtxt(filename2,
                 #skip_header=1,
                 usecols=(0, 1))
z2 = numpy.genfromtxt(filename2,
                 #skip_header=1,
                 usecols=(2))

filename1 = "A"
data1 = numpy.genfromtxt(filename1,
                 skip_header=1)
                 #usecols=(0, 1))
filename2 = "B"
data2 = numpy.genfromtxt(filename2,
                  skip_header=1)
                  #usecols=(0, 1)
tree1 = spatial.KDTree(pos1)

match = tree1.query(pos2)
#print match
indices_pos1, indices_pos2 = [], []
for idx_pos1 in range(len(pos1)):
    # find indices in pos2 that match this position (idx_pos1)
    matching_indices_pos2 = numpy.where(match[1]==idx_pos1)[0]

    for idx_pos2 in matching_indices_pos2:
        # distance in sph coo
        distance = d(pos1[idx_pos1], pos2[idx_pos2])

        if distance < 0.01 and z1[idx_pos1]-z2[idx_pos2] > 0.001:
            print pos1[idx_pos1], pos2[idx_pos2], z1[idx_pos1], z2[idx_pos2], distance

如您所见，我首先将 (x,y) 位置计算为在球坐标中测量的单个单位。 file1 中的每个元素都与 file2 中的每个元素进行比较。问题出在 Z 维度的某个地方，但我似乎无法解决这个问题。当打印出结果时，Z 坐标通常彼此相距甚远。似乎我的程序完全忽略了 and 语句。下面我从我的数据中发布了一串结果，这些结果显示了 z 值实际上相距很远的问题。

[ 358.98787832   -3.87297365] [ 358.98667162   -3.82408566] 0.694282 0.5310796 0.000853515096105
[ 358.98787832   -3.87297365] [ 359.00303872   -3.8962745 ] 0.694282 0.5132215 0.000484847441066
[ 358.98787832   -3.87297365] [ 358.99624509   -3.84617685] 0.694282 0.5128636 0.000489860962243
[ 359.0065807    -8.81507801] [ 358.99226267   -8.8451829 ] 0.6865379 0.6675241 0.000580562641945
[ 359.0292886     9.31398903] [ 358.99296163    9.28436493] 0.68445694 0.45485374 0.000811677349685

输出的结构：[position1 (x,y)] [position2 (x,y)] [Z1] [Z2] distance

如您所见，特别是在最后一个示例中，Z 坐标的间距约为 .23，这远远超过了我在上面为其输入的 .001 限制。

如果您能分享任何见解，那就太好了！

Answer 1

至于你原来的问题，你的符号有一个简单的问题。您测试是否 z1-z2 > 0.001，但您可能想要 abs(z1-z2) < 0.001（注意 < 而不是 >）。

你可以让树也考虑 z 坐标，然后你需要给它数据作为 (x,y,z) 而不仅仅是 (x,y)。如果它不知道 z 值，就不能使用它。

应该可以（尽管 sklearn API 可能不允许这样做）直接查询树中的 window，您可以在其中独立绑定坐标范围和 z 范围。想象一个在 x、y、z 方向上具有不同扩展的盒子。但是因为z会有不同的取值范围，所以很难将这些不同的尺度组合起来。

请注意，k-d-tree 不知道球坐标。 +180 度的点和 -180 度的点 - 或者 0 度的点和 360 度的点 - 对于 k-d-tree 来说非常远，但球形距离非常近。所以会漏掉一些点！

在 KD-Tree 中读取 Z 维度

Reading in Z dimension in KD-Tree

python

data-mining

geospatial

kdtree

coordinates