如何在 Python 中找到包含坐标的点的邻居?

How do I find the neighbors of points containing coordinates in Python?

我有很多点及其坐标。 我想打印 至少一个点的三个最近邻点及其到该点的距离 。我怎样才能在 Python 中做到这一点?在WGS84系统中。

NAME    Latitude    Longitude
B   50.94029883 7.019146728
C   50.92073002 6.975268711
D   50.99807758 6.980865543
E   50.98074288 7.035060206
F   51.00696972 7.035993783
G   50.97369889 6.928538763
H   50.94133859 6.927878587
A   50.96712502 6.977825322

最近邻技术对很多点更有效

  • 蛮力(即遍历所有点)复杂度为 O(N^2)
  • 最近邻算法复杂度为 O(N*log(N))

Python

中的最近邻
  1. BallTree
  2. KdTree
  3. Explaining Nearest Neighbor
  4. BallTree vs. KdTree Performance

在您的问题上使用 BallTree 的说明(相关

代码

import pandas as pd
import numpy as np

from sklearn.neighbors import BallTree
from io import StringIO

# Create DataFrame from you lat/lon dataset
data = """NAME Latitude Longitude
B 50.94029883 7.019146728
C 50.92073002 6.975268711
D 50.99807758 6.980865543
E 50.98074288 7.035060206
F 51.00696972 7.035993783
G 50.97369889 6.928538763
H 50.94133859 6.927878587
A 50.96712502 6.977825322"""

# Use StringIO to allow reading of string as CSV
df = pd.read_csv(StringIO(data), sep = ' ')

# Setup Balltree using df as reference dataset
# Use Haversine calculate distance between points on the earth from lat/long
# haversine - https://pypi.org/project/haversine/ 
tree = BallTree(np.deg2rad(df[['Latitude', 'Longitude']].values), metric='haversine')

# Setup distance queries (points for which we want to find nearest neighbors)
other_data = """NAME Latitude Longitude
B_alt 50.94029883 7.019146728
C_alt 50.92073002 6.975268711"""

df_other = pd.read_csv(StringIO(other_data), sep = ' ')

query_lats = df_other['Latitude']
query_lons = df_other['Longitude']

# Find closest city in reference dataset for each in df_other
# use k = 3 for 3 closest neighbors
distances, indices = tree.query(np.deg2rad(np.c_[query_lats, query_lons]), k = 3)

r_km = 6371 # multiplier to convert to km (from unit distance)
for name, d, ind in zip(df_other['NAME'], distances, indices):
  print(f"NAME {name} closest matches:")
  for i, index in enumerate(ind):
    print(f"\t{df['NAME'][index]} with distance {d[i]*r_km:.4f} km")

输出

NAME B_alt closest matches:
    B with distance 0.0000 km
    C with distance 3.7671 km
    A with distance 4.1564 km
NAME C_alt closest matches:
    C with distance 0.0000 km
    B with distance 3.7671 km
    H with distance 4.0350 km