从每个象限获得最近点的快速方法

Question

我想快速（例如，<1 毫秒）按象限执行最近邻搜索。

每次搜索的输入：

二维中的一组固定点space（图像中的黑点）。这在搜索中保持不变，因此它可以存储在高效的数据结构中。 1 到 1000 之间的点数。
每次搜索位置不同的点（图像中的红色点）。抽象地说，这将 2D space 分成 4 个象限（在图像中由红线分隔），很像笛卡尔坐标系中的原点。

每次搜索的输出：

来自每个红色象限的黑点，最接近（图像中的蓝色圆圈）到红点。

输出通常应该是4个点，每个象限一个。但也有一些可能的边缘情况：

没有黑点的象限；该象限不计分
一个象限有多个关系；收集该象限的所有此类点
同一点位于象限的边界（红线）：不要多次收集该点

我知道行不通的事情：

沿着一个轴的最近点，比如 x，可能会很远（图中绿色圆圈）
一个象限中第二近的点（图中紫色圆圈）可能比其他象限中的点更近。简单搜索一下红点的4个最近邻就可以搜到这个我不要的点。

编辑：已接受答案代码的版本和时间

def trial1(quadneighbori,
           printout = False, seedN = None, Npts = 1000, Niter = 1000):

    if seedN != None: np.random.seed(seedN) # random seed

    # Generate Npts (x,y) coordinates where x and y are standard normal
    dataset = np.random.randn(Npts,2)

    for n in range(Niter):
        # Generate random pixel (x,y) coordinates where x and y are standard normal
        red = np.random.randn(1,2)
        dst, i = quadneighbori(dataset, red)
        if printout: print(dst, i)

def quadneighbor1(dataset, red):
    dst = np.zeros(4)
    closest = np.zeros((4,2))

    # Work out a Boolean mask for the 4 quadrants
    right_exclu = dataset[:,0] > red[0,0]
    top_exclu =   dataset[:,1] > red[0,1]
    Q1 = np.logical_and( top_exclu, right_exclu)
    Q2 = np.logical_and(~top_exclu, right_exclu)
    Q3 = np.logical_and(~top_exclu,~right_exclu)
    Q4 = np.logical_and( top_exclu,~right_exclu)
    Qs = [Q1, Q2, Q3, Q4]

    for Qi in range(4):
        # Boolean mask to select points in this quadrant
        thisQuad = dataset[Qs[Qi]]
        if len(thisQuad)==0: continue # if no points, move on to next quadrant
        # Calculate distance of each point in dataset to red point
        distances = cdist(thisQuad, red, 'sqeuclidean')
        # Choose nearest
        idx = np.argmin(distances)
        dst[Qi] = distances[idx]
        closest[Qi] = thisQuad[idx]

    return dst, closest

# numba turns 1.53s trial1 to 4.12ms trial1
@nb.jit(nopython=True, fastmath=True)
def quadneighbor2(dataset, red):
   redX, redY = red[0,0], red[0,1]
   # Distance to and index of nearest point in each quadrant
   dst = np.zeros(4) + 1.0e308           # Start large! Update with something smaller later
   idx = np.zeros(4, dtype=np.uint32)
   for i in nb.prange(dataset.shape[0]):
      # Get this point's x,y
      x, y = dataset[i,0], dataset[i,1]
      # Get quadrant of this point (minus 1)
      if x>redX:
         if y>redY:
             Qi = 0
         else:
             Qi = 1
      else:
         if y>redY:
             Qi = 3
         else:
             Qi = 2

      # Get distance (squared) of this point - square root is slow and unnecessary
      d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
      # Update if nearest
      if d<dst[Qi]:
         dst[Qi] = d
         idx[Qi] = i

   return dst, idx

%timeit trial1(quadneighbor1)
111 ms ± 3.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit trial1(quadneighbor2)
4.12 ms ± 79.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

PS：cdist 与 numba 不兼容，但调整行以删除 cdist 并将 @nb.jit 添加到 quadneighbor1 速度trial1(quadneighbor1) 从 111 毫秒增加到 30.1 毫秒。

Answer 1

更新答案

我已经在numba下修改了原答案运行。它现在在 3 微秒内完成 1,000 个点的数据集 - 与原来的 1 毫秒左右相比要快得多。

#!/usr/bin/env python3

import random, sys
import numpy as np
import numba as nb

@nb.jit(nopython=True, fastmath=True)
def QuadNearestNeighbour(dataset,redX,redY):
   # Distance to and index of nearest point in each quadrant
   dst = np.zeros(4) + 1.0e308           # Start large! Update with something smaller later
   idx = np.zeros(4, dtype=np.uint32)
   for i in nb.prange(dataset.shape[0]):
      # Get this point's x,y
      x, y = dataset[i,0], dataset[i,1]
      # Get quadrant of this point (minus 1)
      if x>redX:
         if y>redY:
             Q = 0
         else:
             Q = 1
      else:
         if y>redY:
             Q = 3
         else:
             Q = 2

      # Get distance (squared) of this point - square root is slow and unnecessary
      d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
      # Update if nearest
      if d<dst[Q]:
         dst[Q] = d
         idx[Q] = i

   return  dst, idx

# Number of points
Npts = 1000

# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)

# Generate random red pixel (X,Y)
redX, redY = random.random(), random.random()

res = QuadNearestNeighbour(dataset,redX,redY)
print(res)

原答案

1,000 个点的运行秒不到 1 毫秒 - 为红点计时超过 1,000 个不同的位置。

这将生成一个包含 1,000 个点的数据集，然后生成 1,000 个红色中心并为每个中心做 4 个象限。在我的 MacBook 上运行不到一秒，所以每个红色中心 1 毫秒。

我相信它可以通过删除无关的 print 语句来进一步优化，而不是计算仅用于说明的内容，也许使用 numba 但一天就足够了。

我没有添加很多错误检查或边缘情况，它不是生产质量代码。

#!/usr/bin/env python3

import random
import numpy as np
from scipy.spatial.distance import cdist

# Let's get some repeatable randomness
np.random.seed(42)

# Number of points, number of iterations
Npts, Niter = 1000, 1000

# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)

# Run lots of iterations for better timing
for n in range(Niter):

   # Generate random red pixel (X,Y)
   red = np.random.rand(1,2)

   # Work out a Boolean mask for the 4 quadrants
   above = dataset[:,0]>red[0,0]
   right = dataset[:,1]>red[0,1]
   Q1 = np.logical_and(above,right)       # top-right
   Q2 = np.logical_and(above,~right)      # top-left
   Q3 = np.logical_and(~above,~right)     # bottom-left
   Q4 = np.logical_and(~above,right)      # bottom-right

   Q = 1
   for quadrant in [Q1,Q2,Q3,Q4]:
       print(f'Quadrant {Q}: ')
       # Boolean mask to select points in this quadrant
       thisQuad = dataset[quadrant]
       l = len(thisQuad)
       if l==0:
           print('No points in quadrant')
           continue
       # print(f'nPoints in quadrant: {l}')
       # print(f'Points: {dataset[quadrant]}')
       # Calculate distance of each point in dataset to red point
       distances = cdist(thisQuad, red, 'sqeuclidean')
       # Choose nearest
       idx = np.argmin(distances)
       print(f'Index: {idx}, point: {thisQuad[idx]}, distance: {distances[idx]}')
       Q += 1

从每个象限获得最近点的快速方法

Fast way to get closest point from each quadrant

python

algorithm

nearest-neighbor

closest