在 Python 中的特定半径内查找经纬度对

Finding pairs of latitude and longitude within a certain radius in Python

给定一个数据帧 df 如下:

    id              location        lon       lat
0    1            Onyx Spire  116.35425  39.87760
1    2        Unison Lookout  116.44333  39.93237
2    3       History Lookout  116.14857  39.73727
3    4     Domination Pillar  116.46387  39.96286
4    5           Union Tower  116.36373  39.95064
5    6   Ruby Forest Obelisk  116.35786  39.89463
6    7      Rust Peak Pillar  116.34870  39.98170
7    8      Ash Forest Tower  116.38461  39.94938
8    9  Prestige Mound Tower  116.34052  39.98977
9   10  Sapphire Mound Tower  116.35063  39.92982
10  11       Kinship Lookout  116.43020  39.99997
11  12    Exhibition Obelisk  116.45108  39.94371

对于每个location,如果它们之间的距离小于等于[=40,我需要找出其他位置名称=], 说 5 公里.

代码基于 this link 的答案:

from scipy.spatial import distance
from math import sin, cos, sqrt, atan2, radians

def get_distance(point1, point2):
    R = 6370
    lat1 = radians(point1[0])  #insert value
    lon1 = radians(point1[1])
    lat2 = radians(point2[0])
    lon2 = radians(point2[1])

    dlon = lon2 - lon1
    dlat = lat2- lat1

    a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
    c = 2 * atan2(sqrt(a), sqrt(1-a))
    distance = R * c
    return distance

all_points = df[['lat', 'lon']].values
dm = distance.cdist(all_points, all_points, get_distance)
pd.DataFrame(dm, index=df.index, columns=df.index)

输出:

           0          1          2   ...         9          10         11
0    0.000000   9.736316  23.494395  ...   5.813891  15.066709  11.054762
1    9.736316   0.000000  33.222475  ...   7.908015   7.598415   1.423357
2   23.494395  33.222475   0.000000  ...  27.492814  37.822285  34.549129
3   13.312235   3.815179  36.787014  ...  10.327235   5.024900   2.391864
4    8.160542   7.082601  30.000842  ...   2.569988   7.883467   7.484839
5    1.918235   8.409888  25.009951  ...   3.960618  13.235325   9.641336
6   11.583243   9.752599  32.096627  ...   5.770232   7.233093   9.692770
7    8.389761   5.350670  31.017383  ...   3.622002   6.835323   5.700434
8   12.525586  10.838805  32.501864  ...   6.720541   7.722060  10.722467
9    5.813891   7.908015  27.492814  ...   0.000000  10.334273   8.701063
10  15.066709   7.598415  37.822285  ...  10.334273   0.000000   6.502921
11  11.054762   1.423357  34.549129  ...   8.701063   6.502921   0.000000

但我想获得类似于以下数据框的输出。请注意 location1location2location3 是距离 location <= 5 公里 的位置名称(配对位置名称可能不准确,仅举个例子帮助理解),如果是NaN,则不存在这样的location

    id              location  ...            location2          location3
0    1            Onyx Spire  ...                  NaN                NaN
1    2        Unison Lookout  ...                  NaN                NaN
2    3       History Lookout  ...                  NaN                NaN
3    4     Domination Pillar  ...                  NaN                NaN
4    5           Union Tower  ...                  NaN                NaN
5    6   Ruby Forest Obelisk  ...                  NaN                NaN
6    7      Rust Peak Pillar  ...                  NaN                NaN
7    8      Ash Forest Tower  ...      Kinship Lookout                NaN
8    9  Prestige Mound Tower  ...                  NaN                NaN
9   10  Sapphire Mound Tower  ...                  NaN                NaN
10  11       Kinship Lookout  ...  Ruby Forest Obelisk  Domination Pillar
11  12    Exhibition Obelisk  ...                  NaN                NaN

我怎么能在 Python 中做到这一点?谢谢。

想法是为非 0 值和不太像 5km 的值创建掩码,然后使用 DataFrame.dot for matrix multiplication nas last use Series.str.split 连接到原始的新列:

df1 = pd.DataFrame(dm, index=df.index, columns=df.index)

df = (df.join((df1.ne(0) & df1.lt(5)).dot(df['location']+ ',')
                                     .str[:-1]
                                     .str.split(',', expand=True)
                                     .add_prefix('loc')))

print (df)
    id              location        lon       lat                 loc0  \
0    1            Onyx Spire  116.35425  39.87760  Ruby Forest Obelisk   
1    2        Unison Lookout  116.44333  39.93237    Domination Pillar   
2    3       History Lookout  116.14857  39.73727                        
3    4     Domination Pillar  116.46387  39.96286       Unison Lookout   
4    5           Union Tower  116.36373  39.95064     Rust Peak Pillar   
5    6   Ruby Forest Obelisk  116.35786  39.89463           Onyx Spire   
6    7      Rust Peak Pillar  116.34870  39.98170          Union Tower   
7    8      Ash Forest Tower  116.38461  39.94938          Union Tower   
8    9  Prestige Mound Tower  116.34052  39.98977          Union Tower   
9   10  Sapphire Mound Tower  116.35063  39.92982          Union Tower   
10  11       Kinship Lookout  116.43020  39.99997                        
11  12    Exhibition Obelisk  116.45108  39.94371       Unison Lookout   

                    loc1                  loc2                  loc3  
0                   None                  None                  None  
1     Exhibition Obelisk                  None                  None  
2                   None                  None                  None  
3     Exhibition Obelisk                  None                  None  
4       Ash Forest Tower  Prestige Mound Tower  Sapphire Mound Tower  
5   Sapphire Mound Tower                  None                  None  
6       Ash Forest Tower  Prestige Mound Tower                  None  
7       Rust Peak Pillar  Sapphire Mound Tower                  None  
8       Rust Peak Pillar                  None                  None  
9    Ruby Forest Obelisk      Ash Forest Tower                  None  
10                  None                  None                  None  
11     Domination Pillar                  None                  None  

对于排序值使用:

df1 = pd.DataFrame(dm, index=df.index, columns=df['location'])

df1 = df.join(df1.apply(lambda x: pd.Series(x[(x!=0)&(x < 5)].sort_values().index), axis=1)
                .add_prefix('loc'))
print (df1)
    id              location        lon       lat                  loc0  \
0    1            Onyx Spire  116.35425  39.87760   Ruby Forest Obelisk   
1    2        Unison Lookout  116.44333  39.93237    Exhibition Obelisk   
2    3       History Lookout  116.14857  39.73727                   NaN   
3    4     Domination Pillar  116.46387  39.96286    Exhibition Obelisk   
4    5           Union Tower  116.36373  39.95064      Ash Forest Tower   
5    6   Ruby Forest Obelisk  116.35786  39.89463            Onyx Spire   
6    7      Rust Peak Pillar  116.34870  39.98170  Prestige Mound Tower   
7    8      Ash Forest Tower  116.38461  39.94938           Union Tower   
8    9  Prestige Mound Tower  116.34052  39.98977      Rust Peak Pillar   
9   10  Sapphire Mound Tower  116.35063  39.92982           Union Tower   
10  11       Kinship Lookout  116.43020  39.99997                   NaN   
11  12    Exhibition Obelisk  116.45108  39.94371        Unison Lookout   

                    loc1                 loc2                  loc3  
0                    NaN                  NaN                   NaN  
1      Domination Pillar                  NaN                   NaN  
2                    NaN                  NaN                   NaN  
3         Unison Lookout                  NaN                   NaN  
4   Sapphire Mound Tower     Rust Peak Pillar  Prestige Mound Tower  
5   Sapphire Mound Tower                  NaN                   NaN  
6            Union Tower     Ash Forest Tower                   NaN  
7   Sapphire Mound Tower     Rust Peak Pillar                   NaN  
8            Union Tower                  NaN                   NaN  
9       Ash Forest Tower  Ruby Forest Obelisk                   NaN  
10                   NaN                  NaN                   NaN  
11     Domination Pillar                  NaN                   NaN  

这里是使用 BallTree 的方法,从最短距离到最长距离排序

from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np


data = { 'lon' : [116.35425, 116.44333, 116.14857, 116.46387, 116.36373, 116.35786, 116.34870, 116.38461, 116.34052, 116.35063, 116.43020, 116.45108],
'lat' : [39.87760, 39.93237, 39.73727, 39.96286, 39.95064, 39.89463, 39.98170, 39.94938, 39.98977, 39.92982, 39.99997, 39.94371],
'location' : ["Onyx Spire", "Unison Lookout", "History Lookout", "Domination Pillar", "Union Tower", "Ruby Forest Obelisk", "Rust Peak Pillar", "Ash Forest Tower", "Prestige Mound Tower", "Sapphire Mound Tower", "Kinship Lookout", "Exhibition Obelisk"]}

locations = pd.DataFrame.from_dict(data)

创建 BallTree

locations_radians =  np.radians(locations[["lat","lon"]].values)
tree = BallTree(locations_radians, leaf_size=12, metric='haversine')
distance_in_meters = 5000
earth_radius = 6371000
    
radius = distance_in_meters / earth_radius

请注意,我首先对 is_within_sorted

中的 is_within 进行排序
is_within, distances = tree.query_radius(locations_radians, r=radius, count_only=False, return_distance=True) 

is_within_sorted = [ iw[ np.argsort(di) ] for iw,di in zip(is_within, distances) ]
distances_sorted = [np.sort(d) for d in distances]

is_within 包含不同长度的数组,这些数组将 return 半径内的位置标记。您可以将这些与实际距离一起存储。

现在我用 Nan 填充并创建一个 DF,稍后加入

pad_with_nans = [ np.pad(locations.location[iw], (0,locations.lat.size), 'constant', constant_values=np.nan)[:locations.lat.size] for iw in is_within_sorted]
location_names = [ 'location_{}'.format(i) for i in range(locations.lat.size) ]
within_radius = pd.DataFrame(pad_with_nans, index=locations.index, columns=location_names)

我们有

locations.join(within_radius)

给予

         lon       lat           location         location_0  \
0  116.35425  39.87760         Onyx Spire         Onyx Spire   
1  116.44333  39.93237     Unison Lookout     Unison Lookout   
2  116.14857  39.73727    History Lookout    History Lookout   
3  116.46387  39.96286  Domination Pillar  Domination Pillar   
4  116.36373  39.95064        Union Tower        Union Tower   

            location_1            location_2        location_3  \
0  Ruby Forest Obelisk                   NaN               NaN   
1   Exhibition Obelisk     Domination Pillar               NaN   
2                  NaN                   NaN               NaN   
3   Exhibition Obelisk        Unison Lookout               NaN   
4     Ash Forest Tower  Sapphire Mound Tower  Rust Peak Pillar   

             location_4  location_5  location_6  location_7  location_8  \
0                   NaN         NaN         NaN         NaN         NaN   
1                   NaN         NaN         NaN         NaN         NaN   
2                   NaN         NaN         NaN         NaN         NaN   
3                   NaN         NaN         NaN         NaN         NaN   
4  Prestige Mound Tower         NaN         NaN         NaN         NaN   

   location_9  location_10  location_11  
0         NaN          NaN          NaN  
1         NaN          NaN          NaN  
2         NaN          NaN          NaN  
3         NaN          NaN          NaN  
4         NaN          NaN          NaN  

点本身始终在其内部,因此您可以删除第一列。