在 Python 中的特定半径内查找经纬度对
Finding pairs of latitude and longitude within a certain radius in Python
给定一个数据帧 df
如下:
id location lon lat
0 1 Onyx Spire 116.35425 39.87760
1 2 Unison Lookout 116.44333 39.93237
2 3 History Lookout 116.14857 39.73727
3 4 Domination Pillar 116.46387 39.96286
4 5 Union Tower 116.36373 39.95064
5 6 Ruby Forest Obelisk 116.35786 39.89463
6 7 Rust Peak Pillar 116.34870 39.98170
7 8 Ash Forest Tower 116.38461 39.94938
8 9 Prestige Mound Tower 116.34052 39.98977
9 10 Sapphire Mound Tower 116.35063 39.92982
10 11 Kinship Lookout 116.43020 39.99997
11 12 Exhibition Obelisk 116.45108 39.94371
对于每个location
,如果它们之间的距离小于且等于[=40,我需要找出其他位置名称=], 说 5 公里.
代码基于 this link 的答案:
from scipy.spatial import distance
from math import sin, cos, sqrt, atan2, radians
def get_distance(point1, point2):
R = 6370
lat1 = radians(point1[0]) #insert value
lon1 = radians(point1[1])
lat2 = radians(point2[0])
lon2 = radians(point2[1])
dlon = lon2 - lon1
dlat = lat2- lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
return distance
all_points = df[['lat', 'lon']].values
dm = distance.cdist(all_points, all_points, get_distance)
pd.DataFrame(dm, index=df.index, columns=df.index)
输出:
0 1 2 ... 9 10 11
0 0.000000 9.736316 23.494395 ... 5.813891 15.066709 11.054762
1 9.736316 0.000000 33.222475 ... 7.908015 7.598415 1.423357
2 23.494395 33.222475 0.000000 ... 27.492814 37.822285 34.549129
3 13.312235 3.815179 36.787014 ... 10.327235 5.024900 2.391864
4 8.160542 7.082601 30.000842 ... 2.569988 7.883467 7.484839
5 1.918235 8.409888 25.009951 ... 3.960618 13.235325 9.641336
6 11.583243 9.752599 32.096627 ... 5.770232 7.233093 9.692770
7 8.389761 5.350670 31.017383 ... 3.622002 6.835323 5.700434
8 12.525586 10.838805 32.501864 ... 6.720541 7.722060 10.722467
9 5.813891 7.908015 27.492814 ... 0.000000 10.334273 8.701063
10 15.066709 7.598415 37.822285 ... 10.334273 0.000000 6.502921
11 11.054762 1.423357 34.549129 ... 8.701063 6.502921 0.000000
但我想获得类似于以下数据框的输出。请注意 location1
、location2
、location3
是距离 location
<= 5 公里 的位置名称(配对位置名称可能不准确,仅举个例子帮助理解),如果是NaN
,则不存在这样的location
:
id location ... location2 location3
0 1 Onyx Spire ... NaN NaN
1 2 Unison Lookout ... NaN NaN
2 3 History Lookout ... NaN NaN
3 4 Domination Pillar ... NaN NaN
4 5 Union Tower ... NaN NaN
5 6 Ruby Forest Obelisk ... NaN NaN
6 7 Rust Peak Pillar ... NaN NaN
7 8 Ash Forest Tower ... Kinship Lookout NaN
8 9 Prestige Mound Tower ... NaN NaN
9 10 Sapphire Mound Tower ... NaN NaN
10 11 Kinship Lookout ... Ruby Forest Obelisk Domination Pillar
11 12 Exhibition Obelisk ... NaN NaN
我怎么能在 Python 中做到这一点?谢谢。
想法是为非 0
值和不太像 5km
的值创建掩码,然后使用 DataFrame.dot
for matrix multiplication nas last use Series.str.split
连接到原始的新列:
df1 = pd.DataFrame(dm, index=df.index, columns=df.index)
df = (df.join((df1.ne(0) & df1.lt(5)).dot(df['location']+ ',')
.str[:-1]
.str.split(',', expand=True)
.add_prefix('loc')))
print (df)
id location lon lat loc0 \
0 1 Onyx Spire 116.35425 39.87760 Ruby Forest Obelisk
1 2 Unison Lookout 116.44333 39.93237 Domination Pillar
2 3 History Lookout 116.14857 39.73727
3 4 Domination Pillar 116.46387 39.96286 Unison Lookout
4 5 Union Tower 116.36373 39.95064 Rust Peak Pillar
5 6 Ruby Forest Obelisk 116.35786 39.89463 Onyx Spire
6 7 Rust Peak Pillar 116.34870 39.98170 Union Tower
7 8 Ash Forest Tower 116.38461 39.94938 Union Tower
8 9 Prestige Mound Tower 116.34052 39.98977 Union Tower
9 10 Sapphire Mound Tower 116.35063 39.92982 Union Tower
10 11 Kinship Lookout 116.43020 39.99997
11 12 Exhibition Obelisk 116.45108 39.94371 Unison Lookout
loc1 loc2 loc3
0 None None None
1 Exhibition Obelisk None None
2 None None None
3 Exhibition Obelisk None None
4 Ash Forest Tower Prestige Mound Tower Sapphire Mound Tower
5 Sapphire Mound Tower None None
6 Ash Forest Tower Prestige Mound Tower None
7 Rust Peak Pillar Sapphire Mound Tower None
8 Rust Peak Pillar None None
9 Ruby Forest Obelisk Ash Forest Tower None
10 None None None
11 Domination Pillar None None
对于排序值使用:
df1 = pd.DataFrame(dm, index=df.index, columns=df['location'])
df1 = df.join(df1.apply(lambda x: pd.Series(x[(x!=0)&(x < 5)].sort_values().index), axis=1)
.add_prefix('loc'))
print (df1)
id location lon lat loc0 \
0 1 Onyx Spire 116.35425 39.87760 Ruby Forest Obelisk
1 2 Unison Lookout 116.44333 39.93237 Exhibition Obelisk
2 3 History Lookout 116.14857 39.73727 NaN
3 4 Domination Pillar 116.46387 39.96286 Exhibition Obelisk
4 5 Union Tower 116.36373 39.95064 Ash Forest Tower
5 6 Ruby Forest Obelisk 116.35786 39.89463 Onyx Spire
6 7 Rust Peak Pillar 116.34870 39.98170 Prestige Mound Tower
7 8 Ash Forest Tower 116.38461 39.94938 Union Tower
8 9 Prestige Mound Tower 116.34052 39.98977 Rust Peak Pillar
9 10 Sapphire Mound Tower 116.35063 39.92982 Union Tower
10 11 Kinship Lookout 116.43020 39.99997 NaN
11 12 Exhibition Obelisk 116.45108 39.94371 Unison Lookout
loc1 loc2 loc3
0 NaN NaN NaN
1 Domination Pillar NaN NaN
2 NaN NaN NaN
3 Unison Lookout NaN NaN
4 Sapphire Mound Tower Rust Peak Pillar Prestige Mound Tower
5 Sapphire Mound Tower NaN NaN
6 Union Tower Ash Forest Tower NaN
7 Sapphire Mound Tower Rust Peak Pillar NaN
8 Union Tower NaN NaN
9 Ash Forest Tower Ruby Forest Obelisk NaN
10 NaN NaN NaN
11 Domination Pillar NaN NaN
这里是使用 BallTree 的方法,从最短距离到最长距离排序
from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np
data = { 'lon' : [116.35425, 116.44333, 116.14857, 116.46387, 116.36373, 116.35786, 116.34870, 116.38461, 116.34052, 116.35063, 116.43020, 116.45108],
'lat' : [39.87760, 39.93237, 39.73727, 39.96286, 39.95064, 39.89463, 39.98170, 39.94938, 39.98977, 39.92982, 39.99997, 39.94371],
'location' : ["Onyx Spire", "Unison Lookout", "History Lookout", "Domination Pillar", "Union Tower", "Ruby Forest Obelisk", "Rust Peak Pillar", "Ash Forest Tower", "Prestige Mound Tower", "Sapphire Mound Tower", "Kinship Lookout", "Exhibition Obelisk"]}
locations = pd.DataFrame.from_dict(data)
创建 BallTree
locations_radians = np.radians(locations[["lat","lon"]].values)
tree = BallTree(locations_radians, leaf_size=12, metric='haversine')
distance_in_meters = 5000
earth_radius = 6371000
radius = distance_in_meters / earth_radius
请注意,我首先对 is_within_sorted
中的 is_within
进行排序
is_within, distances = tree.query_radius(locations_radians, r=radius, count_only=False, return_distance=True)
is_within_sorted = [ iw[ np.argsort(di) ] for iw,di in zip(is_within, distances) ]
distances_sorted = [np.sort(d) for d in distances]
is_within
包含不同长度的数组,这些数组将 return 半径内的位置标记。您可以将这些与实际距离一起存储。
现在我用 Nan
填充并创建一个 DF,稍后加入
pad_with_nans = [ np.pad(locations.location[iw], (0,locations.lat.size), 'constant', constant_values=np.nan)[:locations.lat.size] for iw in is_within_sorted]
location_names = [ 'location_{}'.format(i) for i in range(locations.lat.size) ]
within_radius = pd.DataFrame(pad_with_nans, index=locations.index, columns=location_names)
我们有
locations.join(within_radius)
给予
lon lat location location_0 \
0 116.35425 39.87760 Onyx Spire Onyx Spire
1 116.44333 39.93237 Unison Lookout Unison Lookout
2 116.14857 39.73727 History Lookout History Lookout
3 116.46387 39.96286 Domination Pillar Domination Pillar
4 116.36373 39.95064 Union Tower Union Tower
location_1 location_2 location_3 \
0 Ruby Forest Obelisk NaN NaN
1 Exhibition Obelisk Domination Pillar NaN
2 NaN NaN NaN
3 Exhibition Obelisk Unison Lookout NaN
4 Ash Forest Tower Sapphire Mound Tower Rust Peak Pillar
location_4 location_5 location_6 location_7 location_8 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 Prestige Mound Tower NaN NaN NaN NaN
location_9 location_10 location_11
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
点本身始终在其内部,因此您可以删除第一列。
给定一个数据帧 df
如下:
id location lon lat
0 1 Onyx Spire 116.35425 39.87760
1 2 Unison Lookout 116.44333 39.93237
2 3 History Lookout 116.14857 39.73727
3 4 Domination Pillar 116.46387 39.96286
4 5 Union Tower 116.36373 39.95064
5 6 Ruby Forest Obelisk 116.35786 39.89463
6 7 Rust Peak Pillar 116.34870 39.98170
7 8 Ash Forest Tower 116.38461 39.94938
8 9 Prestige Mound Tower 116.34052 39.98977
9 10 Sapphire Mound Tower 116.35063 39.92982
10 11 Kinship Lookout 116.43020 39.99997
11 12 Exhibition Obelisk 116.45108 39.94371
对于每个location
,如果它们之间的距离小于且等于[=40,我需要找出其他位置名称=], 说 5 公里.
代码基于 this link 的答案:
from scipy.spatial import distance
from math import sin, cos, sqrt, atan2, radians
def get_distance(point1, point2):
R = 6370
lat1 = radians(point1[0]) #insert value
lon1 = radians(point1[1])
lat2 = radians(point2[0])
lon2 = radians(point2[1])
dlon = lon2 - lon1
dlat = lat2- lat1
a = sin(dlat / 2)**2 + cos(lat1) * cos(lat2) * sin(dlon / 2)**2
c = 2 * atan2(sqrt(a), sqrt(1-a))
distance = R * c
return distance
all_points = df[['lat', 'lon']].values
dm = distance.cdist(all_points, all_points, get_distance)
pd.DataFrame(dm, index=df.index, columns=df.index)
输出:
0 1 2 ... 9 10 11
0 0.000000 9.736316 23.494395 ... 5.813891 15.066709 11.054762
1 9.736316 0.000000 33.222475 ... 7.908015 7.598415 1.423357
2 23.494395 33.222475 0.000000 ... 27.492814 37.822285 34.549129
3 13.312235 3.815179 36.787014 ... 10.327235 5.024900 2.391864
4 8.160542 7.082601 30.000842 ... 2.569988 7.883467 7.484839
5 1.918235 8.409888 25.009951 ... 3.960618 13.235325 9.641336
6 11.583243 9.752599 32.096627 ... 5.770232 7.233093 9.692770
7 8.389761 5.350670 31.017383 ... 3.622002 6.835323 5.700434
8 12.525586 10.838805 32.501864 ... 6.720541 7.722060 10.722467
9 5.813891 7.908015 27.492814 ... 0.000000 10.334273 8.701063
10 15.066709 7.598415 37.822285 ... 10.334273 0.000000 6.502921
11 11.054762 1.423357 34.549129 ... 8.701063 6.502921 0.000000
但我想获得类似于以下数据框的输出。请注意 location1
、location2
、location3
是距离 location
<= 5 公里 的位置名称(配对位置名称可能不准确,仅举个例子帮助理解),如果是NaN
,则不存在这样的location
:
id location ... location2 location3
0 1 Onyx Spire ... NaN NaN
1 2 Unison Lookout ... NaN NaN
2 3 History Lookout ... NaN NaN
3 4 Domination Pillar ... NaN NaN
4 5 Union Tower ... NaN NaN
5 6 Ruby Forest Obelisk ... NaN NaN
6 7 Rust Peak Pillar ... NaN NaN
7 8 Ash Forest Tower ... Kinship Lookout NaN
8 9 Prestige Mound Tower ... NaN NaN
9 10 Sapphire Mound Tower ... NaN NaN
10 11 Kinship Lookout ... Ruby Forest Obelisk Domination Pillar
11 12 Exhibition Obelisk ... NaN NaN
我怎么能在 Python 中做到这一点?谢谢。
想法是为非 0
值和不太像 5km
的值创建掩码,然后使用 DataFrame.dot
for matrix multiplication nas last use Series.str.split
连接到原始的新列:
df1 = pd.DataFrame(dm, index=df.index, columns=df.index)
df = (df.join((df1.ne(0) & df1.lt(5)).dot(df['location']+ ',')
.str[:-1]
.str.split(',', expand=True)
.add_prefix('loc')))
print (df)
id location lon lat loc0 \
0 1 Onyx Spire 116.35425 39.87760 Ruby Forest Obelisk
1 2 Unison Lookout 116.44333 39.93237 Domination Pillar
2 3 History Lookout 116.14857 39.73727
3 4 Domination Pillar 116.46387 39.96286 Unison Lookout
4 5 Union Tower 116.36373 39.95064 Rust Peak Pillar
5 6 Ruby Forest Obelisk 116.35786 39.89463 Onyx Spire
6 7 Rust Peak Pillar 116.34870 39.98170 Union Tower
7 8 Ash Forest Tower 116.38461 39.94938 Union Tower
8 9 Prestige Mound Tower 116.34052 39.98977 Union Tower
9 10 Sapphire Mound Tower 116.35063 39.92982 Union Tower
10 11 Kinship Lookout 116.43020 39.99997
11 12 Exhibition Obelisk 116.45108 39.94371 Unison Lookout
loc1 loc2 loc3
0 None None None
1 Exhibition Obelisk None None
2 None None None
3 Exhibition Obelisk None None
4 Ash Forest Tower Prestige Mound Tower Sapphire Mound Tower
5 Sapphire Mound Tower None None
6 Ash Forest Tower Prestige Mound Tower None
7 Rust Peak Pillar Sapphire Mound Tower None
8 Rust Peak Pillar None None
9 Ruby Forest Obelisk Ash Forest Tower None
10 None None None
11 Domination Pillar None None
对于排序值使用:
df1 = pd.DataFrame(dm, index=df.index, columns=df['location'])
df1 = df.join(df1.apply(lambda x: pd.Series(x[(x!=0)&(x < 5)].sort_values().index), axis=1)
.add_prefix('loc'))
print (df1)
id location lon lat loc0 \
0 1 Onyx Spire 116.35425 39.87760 Ruby Forest Obelisk
1 2 Unison Lookout 116.44333 39.93237 Exhibition Obelisk
2 3 History Lookout 116.14857 39.73727 NaN
3 4 Domination Pillar 116.46387 39.96286 Exhibition Obelisk
4 5 Union Tower 116.36373 39.95064 Ash Forest Tower
5 6 Ruby Forest Obelisk 116.35786 39.89463 Onyx Spire
6 7 Rust Peak Pillar 116.34870 39.98170 Prestige Mound Tower
7 8 Ash Forest Tower 116.38461 39.94938 Union Tower
8 9 Prestige Mound Tower 116.34052 39.98977 Rust Peak Pillar
9 10 Sapphire Mound Tower 116.35063 39.92982 Union Tower
10 11 Kinship Lookout 116.43020 39.99997 NaN
11 12 Exhibition Obelisk 116.45108 39.94371 Unison Lookout
loc1 loc2 loc3
0 NaN NaN NaN
1 Domination Pillar NaN NaN
2 NaN NaN NaN
3 Unison Lookout NaN NaN
4 Sapphire Mound Tower Rust Peak Pillar Prestige Mound Tower
5 Sapphire Mound Tower NaN NaN
6 Union Tower Ash Forest Tower NaN
7 Sapphire Mound Tower Rust Peak Pillar NaN
8 Union Tower NaN NaN
9 Ash Forest Tower Ruby Forest Obelisk NaN
10 NaN NaN NaN
11 Domination Pillar NaN NaN
这里是使用 BallTree 的方法,从最短距离到最长距离排序
from sklearn.neighbors import BallTree
import pandas as pd
import numpy as np
data = { 'lon' : [116.35425, 116.44333, 116.14857, 116.46387, 116.36373, 116.35786, 116.34870, 116.38461, 116.34052, 116.35063, 116.43020, 116.45108],
'lat' : [39.87760, 39.93237, 39.73727, 39.96286, 39.95064, 39.89463, 39.98170, 39.94938, 39.98977, 39.92982, 39.99997, 39.94371],
'location' : ["Onyx Spire", "Unison Lookout", "History Lookout", "Domination Pillar", "Union Tower", "Ruby Forest Obelisk", "Rust Peak Pillar", "Ash Forest Tower", "Prestige Mound Tower", "Sapphire Mound Tower", "Kinship Lookout", "Exhibition Obelisk"]}
locations = pd.DataFrame.from_dict(data)
创建 BallTree
locations_radians = np.radians(locations[["lat","lon"]].values)
tree = BallTree(locations_radians, leaf_size=12, metric='haversine')
distance_in_meters = 5000
earth_radius = 6371000
radius = distance_in_meters / earth_radius
请注意,我首先对 is_within_sorted
is_within
进行排序
is_within, distances = tree.query_radius(locations_radians, r=radius, count_only=False, return_distance=True)
is_within_sorted = [ iw[ np.argsort(di) ] for iw,di in zip(is_within, distances) ]
distances_sorted = [np.sort(d) for d in distances]
is_within
包含不同长度的数组,这些数组将 return 半径内的位置标记。您可以将这些与实际距离一起存储。
现在我用 Nan
填充并创建一个 DF,稍后加入
pad_with_nans = [ np.pad(locations.location[iw], (0,locations.lat.size), 'constant', constant_values=np.nan)[:locations.lat.size] for iw in is_within_sorted]
location_names = [ 'location_{}'.format(i) for i in range(locations.lat.size) ]
within_radius = pd.DataFrame(pad_with_nans, index=locations.index, columns=location_names)
我们有
locations.join(within_radius)
给予
lon lat location location_0 \
0 116.35425 39.87760 Onyx Spire Onyx Spire
1 116.44333 39.93237 Unison Lookout Unison Lookout
2 116.14857 39.73727 History Lookout History Lookout
3 116.46387 39.96286 Domination Pillar Domination Pillar
4 116.36373 39.95064 Union Tower Union Tower
location_1 location_2 location_3 \
0 Ruby Forest Obelisk NaN NaN
1 Exhibition Obelisk Domination Pillar NaN
2 NaN NaN NaN
3 Exhibition Obelisk Unison Lookout NaN
4 Ash Forest Tower Sapphire Mound Tower Rust Peak Pillar
location_4 location_5 location_6 location_7 location_8 \
0 NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN
4 Prestige Mound Tower NaN NaN NaN NaN
location_9 location_10 location_11
0 NaN NaN NaN
1 NaN NaN NaN
2 NaN NaN NaN
3 NaN NaN NaN
4 NaN NaN NaN
点本身始终在其内部,因此您可以删除第一列。