从每个象限获得最近点的快速方法
Fast way to get closest point from each quadrant
我想快速(例如,<1 毫秒)按象限执行最近邻搜索。
每次搜索的输入:
- 二维中的一组固定点space(图像中的黑点)。这在搜索中保持不变,因此它可以存储在高效的数据结构中。 1 到 1000 之间的点数。
- 每次搜索位置不同的点(图像中的红色点)。抽象地说,这将 2D space 分成 4 个象限(在图像中由红线分隔),很像笛卡尔坐标系中的原点。
每次搜索的输出:
- 来自每个 红色象限的黑点,最接近(图像中的蓝色圆圈)到红点。
输出通常应该是4个点,每个象限一个。但也有一些可能的边缘情况:
- 没有黑点的象限;该象限不计分
- 一个象限有多个关系;收集该象限的所有此类点
- 同一点位于象限的边界(红线):不要多次收集该点
我知道行不通的事情:
- 沿着一个轴的最近点,比如 x,可能会很远(图中绿色圆圈)
- 一个象限中第二近的点(图中紫色圆圈)可能比其他象限中的点更近。简单搜索一下红点的4个最近邻就可以搜到这个我不要的点。
编辑:已接受答案代码的版本和时间
def trial1(quadneighbori,
printout = False, seedN = None, Npts = 1000, Niter = 1000):
if seedN != None: np.random.seed(seedN) # random seed
# Generate Npts (x,y) coordinates where x and y are standard normal
dataset = np.random.randn(Npts,2)
for n in range(Niter):
# Generate random pixel (x,y) coordinates where x and y are standard normal
red = np.random.randn(1,2)
dst, i = quadneighbori(dataset, red)
if printout: print(dst, i)
def quadneighbor1(dataset, red):
dst = np.zeros(4)
closest = np.zeros((4,2))
# Work out a Boolean mask for the 4 quadrants
right_exclu = dataset[:,0] > red[0,0]
top_exclu = dataset[:,1] > red[0,1]
Q1 = np.logical_and( top_exclu, right_exclu)
Q2 = np.logical_and(~top_exclu, right_exclu)
Q3 = np.logical_and(~top_exclu,~right_exclu)
Q4 = np.logical_and( top_exclu,~right_exclu)
Qs = [Q1, Q2, Q3, Q4]
for Qi in range(4):
# Boolean mask to select points in this quadrant
thisQuad = dataset[Qs[Qi]]
if len(thisQuad)==0: continue # if no points, move on to next quadrant
# Calculate distance of each point in dataset to red point
distances = cdist(thisQuad, red, 'sqeuclidean')
# Choose nearest
idx = np.argmin(distances)
dst[Qi] = distances[idx]
closest[Qi] = thisQuad[idx]
return dst, closest
# numba turns 1.53s trial1 to 4.12ms trial1
@nb.jit(nopython=True, fastmath=True)
def quadneighbor2(dataset, red):
redX, redY = red[0,0], red[0,1]
# Distance to and index of nearest point in each quadrant
dst = np.zeros(4) + 1.0e308 # Start large! Update with something smaller later
idx = np.zeros(4, dtype=np.uint32)
for i in nb.prange(dataset.shape[0]):
# Get this point's x,y
x, y = dataset[i,0], dataset[i,1]
# Get quadrant of this point (minus 1)
if x>redX:
if y>redY:
Qi = 0
else:
Qi = 1
else:
if y>redY:
Qi = 3
else:
Qi = 2
# Get distance (squared) of this point - square root is slow and unnecessary
d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
# Update if nearest
if d<dst[Qi]:
dst[Qi] = d
idx[Qi] = i
return dst, idx
%timeit trial1(quadneighbor1)
111 ms ± 3.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit trial1(quadneighbor2)
4.12 ms ± 79.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PS:cdist
与 numba
不兼容,但调整行以删除 cdist
并将 @nb.jit
添加到 quadneighbor1
速度trial1(quadneighbor1)
从 111 毫秒增加到 30.1 毫秒。
更新答案
我已经在numba
下修改了原答案运行。它现在在 3 微秒内完成 1,000 个点的数据集 - 与原来的 1 毫秒左右相比要快得多。
#!/usr/bin/env python3
import random, sys
import numpy as np
import numba as nb
@nb.jit(nopython=True, fastmath=True)
def QuadNearestNeighbour(dataset,redX,redY):
# Distance to and index of nearest point in each quadrant
dst = np.zeros(4) + 1.0e308 # Start large! Update with something smaller later
idx = np.zeros(4, dtype=np.uint32)
for i in nb.prange(dataset.shape[0]):
# Get this point's x,y
x, y = dataset[i,0], dataset[i,1]
# Get quadrant of this point (minus 1)
if x>redX:
if y>redY:
Q = 0
else:
Q = 1
else:
if y>redY:
Q = 3
else:
Q = 2
# Get distance (squared) of this point - square root is slow and unnecessary
d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
# Update if nearest
if d<dst[Q]:
dst[Q] = d
idx[Q] = i
return dst, idx
# Number of points
Npts = 1000
# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)
# Generate random red pixel (X,Y)
redX, redY = random.random(), random.random()
res = QuadNearestNeighbour(dataset,redX,redY)
print(res)
原答案
1,000 个点的 运行 秒不到 1 毫秒 - 为红点计时超过 1,000 个不同的位置。
这将生成一个包含 1,000 个点的数据集,然后生成 1,000 个红色中心并为每个中心做 4 个象限。在我的 MacBook 上 运行不到一秒,所以每个红色中心 1 毫秒。
我相信它可以通过删除无关的 print
语句来进一步优化,而不是计算仅用于说明的内容,也许使用 numba
但一天就足够了。
我没有添加很多错误检查或边缘情况,它不是生产质量代码。
#!/usr/bin/env python3
import random
import numpy as np
from scipy.spatial.distance import cdist
# Let's get some repeatable randomness
np.random.seed(42)
# Number of points, number of iterations
Npts, Niter = 1000, 1000
# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)
# Run lots of iterations for better timing
for n in range(Niter):
# Generate random red pixel (X,Y)
red = np.random.rand(1,2)
# Work out a Boolean mask for the 4 quadrants
above = dataset[:,0]>red[0,0]
right = dataset[:,1]>red[0,1]
Q1 = np.logical_and(above,right) # top-right
Q2 = np.logical_and(above,~right) # top-left
Q3 = np.logical_and(~above,~right) # bottom-left
Q4 = np.logical_and(~above,right) # bottom-right
Q = 1
for quadrant in [Q1,Q2,Q3,Q4]:
print(f'Quadrant {Q}: ')
# Boolean mask to select points in this quadrant
thisQuad = dataset[quadrant]
l = len(thisQuad)
if l==0:
print('No points in quadrant')
continue
# print(f'nPoints in quadrant: {l}')
# print(f'Points: {dataset[quadrant]}')
# Calculate distance of each point in dataset to red point
distances = cdist(thisQuad, red, 'sqeuclidean')
# Choose nearest
idx = np.argmin(distances)
print(f'Index: {idx}, point: {thisQuad[idx]}, distance: {distances[idx]}')
Q += 1
我想快速(例如,<1 毫秒)按象限执行最近邻搜索。
每次搜索的输入:
- 二维中的一组固定点space(图像中的黑点)。这在搜索中保持不变,因此它可以存储在高效的数据结构中。 1 到 1000 之间的点数。
- 每次搜索位置不同的点(图像中的红色点)。抽象地说,这将 2D space 分成 4 个象限(在图像中由红线分隔),很像笛卡尔坐标系中的原点。
每次搜索的输出:
- 来自每个 红色象限的黑点,最接近(图像中的蓝色圆圈)到红点。
输出通常应该是4个点,每个象限一个。但也有一些可能的边缘情况:
- 没有黑点的象限;该象限不计分
- 一个象限有多个关系;收集该象限的所有此类点
- 同一点位于象限的边界(红线):不要多次收集该点
我知道行不通的事情:
- 沿着一个轴的最近点,比如 x,可能会很远(图中绿色圆圈)
- 一个象限中第二近的点(图中紫色圆圈)可能比其他象限中的点更近。简单搜索一下红点的4个最近邻就可以搜到这个我不要的点。
编辑:已接受答案代码的版本和时间
def trial1(quadneighbori,
printout = False, seedN = None, Npts = 1000, Niter = 1000):
if seedN != None: np.random.seed(seedN) # random seed
# Generate Npts (x,y) coordinates where x and y are standard normal
dataset = np.random.randn(Npts,2)
for n in range(Niter):
# Generate random pixel (x,y) coordinates where x and y are standard normal
red = np.random.randn(1,2)
dst, i = quadneighbori(dataset, red)
if printout: print(dst, i)
def quadneighbor1(dataset, red):
dst = np.zeros(4)
closest = np.zeros((4,2))
# Work out a Boolean mask for the 4 quadrants
right_exclu = dataset[:,0] > red[0,0]
top_exclu = dataset[:,1] > red[0,1]
Q1 = np.logical_and( top_exclu, right_exclu)
Q2 = np.logical_and(~top_exclu, right_exclu)
Q3 = np.logical_and(~top_exclu,~right_exclu)
Q4 = np.logical_and( top_exclu,~right_exclu)
Qs = [Q1, Q2, Q3, Q4]
for Qi in range(4):
# Boolean mask to select points in this quadrant
thisQuad = dataset[Qs[Qi]]
if len(thisQuad)==0: continue # if no points, move on to next quadrant
# Calculate distance of each point in dataset to red point
distances = cdist(thisQuad, red, 'sqeuclidean')
# Choose nearest
idx = np.argmin(distances)
dst[Qi] = distances[idx]
closest[Qi] = thisQuad[idx]
return dst, closest
# numba turns 1.53s trial1 to 4.12ms trial1
@nb.jit(nopython=True, fastmath=True)
def quadneighbor2(dataset, red):
redX, redY = red[0,0], red[0,1]
# Distance to and index of nearest point in each quadrant
dst = np.zeros(4) + 1.0e308 # Start large! Update with something smaller later
idx = np.zeros(4, dtype=np.uint32)
for i in nb.prange(dataset.shape[0]):
# Get this point's x,y
x, y = dataset[i,0], dataset[i,1]
# Get quadrant of this point (minus 1)
if x>redX:
if y>redY:
Qi = 0
else:
Qi = 1
else:
if y>redY:
Qi = 3
else:
Qi = 2
# Get distance (squared) of this point - square root is slow and unnecessary
d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
# Update if nearest
if d<dst[Qi]:
dst[Qi] = d
idx[Qi] = i
return dst, idx
%timeit trial1(quadneighbor1)
111 ms ± 3.21 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
%timeit trial1(quadneighbor2)
4.12 ms ± 79.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
PS:cdist
与 numba
不兼容,但调整行以删除 cdist
并将 @nb.jit
添加到 quadneighbor1
速度trial1(quadneighbor1)
从 111 毫秒增加到 30.1 毫秒。
更新答案
我已经在numba
下修改了原答案运行。它现在在 3 微秒内完成 1,000 个点的数据集 - 与原来的 1 毫秒左右相比要快得多。
#!/usr/bin/env python3
import random, sys
import numpy as np
import numba as nb
@nb.jit(nopython=True, fastmath=True)
def QuadNearestNeighbour(dataset,redX,redY):
# Distance to and index of nearest point in each quadrant
dst = np.zeros(4) + 1.0e308 # Start large! Update with something smaller later
idx = np.zeros(4, dtype=np.uint32)
for i in nb.prange(dataset.shape[0]):
# Get this point's x,y
x, y = dataset[i,0], dataset[i,1]
# Get quadrant of this point (minus 1)
if x>redX:
if y>redY:
Q = 0
else:
Q = 1
else:
if y>redY:
Q = 3
else:
Q = 2
# Get distance (squared) of this point - square root is slow and unnecessary
d = (x-redX)*(x-redX) + (y-redY)*(y-redY)
# Update if nearest
if d<dst[Q]:
dst[Q] = d
idx[Q] = i
return dst, idx
# Number of points
Npts = 1000
# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)
# Generate random red pixel (X,Y)
redX, redY = random.random(), random.random()
res = QuadNearestNeighbour(dataset,redX,redY)
print(res)
原答案
1,000 个点的 运行 秒不到 1 毫秒 - 为红点计时超过 1,000 个不同的位置。
这将生成一个包含 1,000 个点的数据集,然后生成 1,000 个红色中心并为每个中心做 4 个象限。在我的 MacBook 上 运行不到一秒,所以每个红色中心 1 毫秒。
我相信它可以通过删除无关的 print
语句来进一步优化,而不是计算仅用于说明的内容,也许使用 numba
但一天就足够了。
我没有添加很多错误检查或边缘情况,它不是生产质量代码。
#!/usr/bin/env python3
import random
import numpy as np
from scipy.spatial.distance import cdist
# Let's get some repeatable randomness
np.random.seed(42)
# Number of points, number of iterations
Npts, Niter = 1000, 1000
# Generate the Npts random X,Y points
dataset = np.random.rand(Npts,2)
# Run lots of iterations for better timing
for n in range(Niter):
# Generate random red pixel (X,Y)
red = np.random.rand(1,2)
# Work out a Boolean mask for the 4 quadrants
above = dataset[:,0]>red[0,0]
right = dataset[:,1]>red[0,1]
Q1 = np.logical_and(above,right) # top-right
Q2 = np.logical_and(above,~right) # top-left
Q3 = np.logical_and(~above,~right) # bottom-left
Q4 = np.logical_and(~above,right) # bottom-right
Q = 1
for quadrant in [Q1,Q2,Q3,Q4]:
print(f'Quadrant {Q}: ')
# Boolean mask to select points in this quadrant
thisQuad = dataset[quadrant]
l = len(thisQuad)
if l==0:
print('No points in quadrant')
continue
# print(f'nPoints in quadrant: {l}')
# print(f'Points: {dataset[quadrant]}')
# Calculate distance of each point in dataset to red point
distances = cdist(thisQuad, red, 'sqeuclidean')
# Choose nearest
idx = np.argmin(distances)
print(f'Index: {idx}, point: {thisQuad[idx]}, distance: {distances[idx]}')
Q += 1