在 Python 中的特定半径内查找经纬度对

Question

给定一个数据帧 df 如下：

    id              location        lon       lat
0    1            Onyx Spire  116.35425  39.87760
1    2        Unison Lookout  116.44333  39.93237
2    3       History Lookout  116.14857  39.73727
3    4     Domination Pillar  116.46387  39.96286
4    5           Union Tower  116.36373  39.95064
5    6   Ruby Forest Obelisk  116.35786  39.89463
6    7      Rust Peak Pillar  116.34870  39.98170
7    8      Ash Forest Tower  116.38461  39.94938
8    9  Prestige Mound Tower  116.34052  39.98977
9   10  Sapphire Mound Tower  116.35063  39.92982
10  11       Kinship Lookout  116.43020  39.99997
11  12    Exhibition Obelisk  116.45108  39.94371

对于每个location，如果它们之间的距离小于且等于[=40，我需要找出其他位置名称=], 说 5 公里.

代码基于 this link 的答案：

from scipy.spatial import distance from math import sin, cos, sqrt, atan2, radians def get_distance(point1, point2): R = 6370 lat1 = radians(point1[0]) #insert value lon1 = radians(point1[1]) lat2 = radians(point2[0]) lon2 = radians(point2[1]) dlon = lon2 - lon1 dlat = lat2- lat1 a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2 c = 2 * atan2(sqrt(a), sqrt(1-a)) distance = R * c return distance all_points = df[['lat', 'lon']].values dm = distance.cdist(all_points, all_points, get_distance) pd.DataFrame(dm, index=df.index, columns=df.index)

输出：

0 1 2 ... 9 10 11 0 0.000000 9.736316 23.494395 ... 5.813891 15.066709 11.054762 1 9.736316 0.000000 33.222475 ... 7.908015 7.598415 1.423357 2 23.494395 33.222475 0.000000 ... 27.492814 37.822285 34.549129 3 13.312235 3.815179 36.787014 ... 10.327235 5.024900 2.391864 4 8.160542 7.082601 30.000842 ... 2.569988 7.883467 7.484839 5 1.918235 8.409888 25.009951 ... 3.960618 13.235325 9.641336 6 11.583243 9.752599 32.096627 ... 5.770232 7.233093 9.692770 7 8.389761 5.350670 31.017383 ... 3.622002 6.835323 5.700434 8 12.525586 10.838805 32.501864 ... 6.720541 7.722060 10.722467 9 5.813891 7.908015 27.492814 ... 0.000000 10.334273 8.701063 10 15.066709 7.598415 37.822285 ... 10.334273 0.000000 6.502921 11 11.054762 1.423357 34.549129 ... 8.701063 6.502921 0.000000

但我想获得类似于以下数据框的输出。请注意 location1、location2、location3 是距离 location <= 5 公里 的位置名称（配对位置名称可能不准确，仅举个例子帮助理解），如果是NaN，则不存在这样的location：

id location ... location2 location3 0 1 Onyx Spire ... NaN NaN 1 2 Unison Lookout ... NaN NaN 2 3 History Lookout ... NaN NaN 3 4 Domination Pillar ... NaN NaN 4 5 Union Tower ... NaN NaN 5 6 Ruby Forest Obelisk ... NaN NaN 6 7 Rust Peak Pillar ... NaN NaN 7 8 Ash Forest Tower ... Kinship Lookout NaN 8 9 Prestige Mound Tower ... NaN NaN 9 10 Sapphire Mound Tower ... NaN NaN 10 11 Kinship Lookout ... Ruby Forest Obelisk Domination Pillar 11 12 Exhibition Obelisk ... NaN NaN

我怎么能在 Python 中做到这一点？谢谢。

Answer 1

想法是为非 0 值和不太像 5km 的值创建掩码，然后使用 DataFrame.dot for matrix multiplication nas last use Series.str.split 连接到原始的新列：

df1 = pd.DataFrame(dm, index=df.index, columns=df.index)

df = (df.join((df1.ne(0) & df1.lt(5)).dot(df['location']+ ',')
                                     .str[:-1]
                                     .str.split(',', expand=True)
                                     .add_prefix('loc')))

print (df)
    id              location        lon       lat                 loc0  \
0    1            Onyx Spire  116.35425  39.87760  Ruby Forest Obelisk   
1    2        Unison Lookout  116.44333  39.93237    Domination Pillar   
2    3       History Lookout  116.14857  39.73727                        
3    4     Domination Pillar  116.46387  39.96286       Unison Lookout   
4    5           Union Tower  116.36373  39.95064     Rust Peak Pillar   
5    6   Ruby Forest Obelisk  116.35786  39.89463           Onyx Spire   
6    7      Rust Peak Pillar  116.34870  39.98170          Union Tower   
7    8      Ash Forest Tower  116.38461  39.94938          Union Tower   
8    9  Prestige Mound Tower  116.34052  39.98977          Union Tower   
9   10  Sapphire Mound Tower  116.35063  39.92982          Union Tower   
10  11       Kinship Lookout  116.43020  39.99997                        
11  12    Exhibition Obelisk  116.45108  39.94371       Unison Lookout   

                    loc1                  loc2                  loc3  
0                   None                  None                  None  
1     Exhibition Obelisk                  None                  None  
2                   None                  None                  None  
3     Exhibition Obelisk                  None                  None  
4       Ash Forest Tower  Prestige Mound Tower  Sapphire Mound Tower  
5   Sapphire Mound Tower                  None                  None  
6       Ash Forest Tower  Prestige Mound Tower                  None  
7       Rust Peak Pillar  Sapphire Mound Tower                  None  
8       Rust Peak Pillar                  None                  None  
9    Ruby Forest Obelisk      Ash Forest Tower                  None  
10                  None                  None                  None  
11     Domination Pillar                  None                  None

对于排序值使用：

df1 = pd.DataFrame(dm, index=df.index, columns=df['location'])

df1 = df.join(df1.apply(lambda x: pd.Series(x[(x!=0)&(x < 5)].sort_values().index), axis=1)
                .add_prefix('loc'))
print (df1)
    id              location        lon       lat                  loc0  \
0    1            Onyx Spire  116.35425  39.87760   Ruby Forest Obelisk   
1    2        Unison Lookout  116.44333  39.93237    Exhibition Obelisk   
2    3       History Lookout  116.14857  39.73727                   NaN   
3    4     Domination Pillar  116.46387  39.96286    Exhibition Obelisk   
4    5           Union Tower  116.36373  39.95064      Ash Forest Tower   
5    6   Ruby Forest Obelisk  116.35786  39.89463            Onyx Spire   
6    7      Rust Peak Pillar  116.34870  39.98170  Prestige Mound Tower   
7    8      Ash Forest Tower  116.38461  39.94938           Union Tower   
8    9  Prestige Mound Tower  116.34052  39.98977      Rust Peak Pillar   
9   10  Sapphire Mound Tower  116.35063  39.92982           Union Tower   
10  11       Kinship Lookout  116.43020  39.99997                   NaN   
11  12    Exhibition Obelisk  116.45108  39.94371        Unison Lookout   

                    loc1                 loc2                  loc3  
0                    NaN                  NaN                   NaN  
1      Domination Pillar                  NaN                   NaN  
2                    NaN                  NaN                   NaN  
3         Unison Lookout                  NaN                   NaN  
4   Sapphire Mound Tower     Rust Peak Pillar  Prestige Mound Tower  
5   Sapphire Mound Tower                  NaN                   NaN  
6            Union Tower     Ash Forest Tower                   NaN  
7   Sapphire Mound Tower     Rust Peak Pillar                   NaN  
8            Union Tower                  NaN                   NaN  
9       Ash Forest Tower  Ruby Forest Obelisk                   NaN  
10                   NaN                  NaN                   NaN  
11     Domination Pillar                  NaN                   NaN

Answer 2

这里是使用 BallTree 的方法，从最短距离到最长距离排序

from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np


data = { 'lon' : [116.35425, 116.44333, 116.14857, 116.46387, 116.36373, 116.35786, 116.34870, 116.38461, 116.34052, 116.35063, 116.43020, 116.45108],
'lat' : [39.87760, 39.93237, 39.73727, 39.96286, 39.95064, 39.89463, 39.98170, 39.94938, 39.98977, 39.92982, 39.99997, 39.94371],
'location' : ["Onyx Spire", "Unison Lookout", "History Lookout", "Domination Pillar", "Union Tower", "Ruby Forest Obelisk", "Rust Peak Pillar", "Ash Forest Tower", "Prestige Mound Tower", "Sapphire Mound Tower", "Kinship Lookout", "Exhibition Obelisk"]}

locations = pd.DataFrame.from_dict(data)

创建 BallTree

locations_radians =  np.radians(locations[["lat","lon"]].values)
tree = BallTree(locations_radians, leaf_size=12, metric='haversine')

distance_in_meters = 5000
earth_radius = 6371000
    
radius = distance_in_meters / earth_radius

请注意，我首先对 is_within_sorted

中的 is_within 进行排序

is_within, distances = tree.query_radius(locations_radians, r=radius, count_only=False, return_distance=True) 

is_within_sorted = [ iw[ np.argsort(di) ] for iw,di in zip(is_within, distances) ]
distances_sorted = [np.sort(d) for d in distances]

is_within 包含不同长度的数组，这些数组将 return 半径内的位置标记。您可以将这些与实际距离一起存储。

现在我用 Nan 填充并创建一个 DF，稍后加入

pad_with_nans = [ np.pad(locations.location[iw], (0,locations.lat.size), 'constant', constant_values=np.nan)[:locations.lat.size] for iw in is_within_sorted]
location_names = [ 'location_{}'.format(i) for i in range(locations.lat.size) ]

within_radius = pd.DataFrame(pad_with_nans, index=locations.index, columns=location_names)

我们有

locations.join(within_radius)

给予

         lon       lat           location         location_0  \
0  116.35425  39.87760         Onyx Spire         Onyx Spire   
1  116.44333  39.93237     Unison Lookout     Unison Lookout   
2  116.14857  39.73727    History Lookout    History Lookout   
3  116.46387  39.96286  Domination Pillar  Domination Pillar   
4  116.36373  39.95064        Union Tower        Union Tower   

            location_1            location_2        location_3  \
0  Ruby Forest Obelisk                   NaN               NaN   
1   Exhibition Obelisk     Domination Pillar               NaN   
2                  NaN                   NaN               NaN   
3   Exhibition Obelisk        Unison Lookout               NaN   
4     Ash Forest Tower  Sapphire Mound Tower  Rust Peak Pillar   

             location_4  location_5  location_6  location_7  location_8  \
0                   NaN         NaN         NaN         NaN         NaN   
1                   NaN         NaN         NaN         NaN         NaN   
2                   NaN         NaN         NaN         NaN         NaN   
3                   NaN         NaN         NaN         NaN         NaN   
4  Prestige Mound Tower         NaN         NaN         NaN         NaN   

   location_9  location_10  location_11  
0         NaN          NaN          NaN  
1         NaN          NaN          NaN  
2         NaN          NaN          NaN  
3         NaN          NaN          NaN  
4         NaN          NaN          NaN

点本身始终在其内部，因此您可以删除第一列。

在 Python 中的特定半径内查找经纬度对

Finding pairs of latitude and longitude within a certain radius in Python

euclidean-distance

dataframe

python-3.x

levenshtein-distance

pandas