查找latlong之间的距离
Finding the distance between latlong
我有点卡住了。我有一个 CSV,其中包括:
站点名称
纬度
经度。
此 CSV 包含 100,000 个位置。我需要为每个位置生成一个逗号分隔列表,显示 5KM
内的其他位置
我已经尝试了附件,它转置了 table 并为我提供了 100,000 列和 100,000 行以及作为结果填充的距离。但我不确定如何制作一个新的 pandas 列,其中包含 5KM 以内的所有站点列表。
你能帮忙吗?
from geopy.distance import geodesic
def distance(row, csr):
lat = row['latitude']
long = row['longitude']
lat_long = (lat, long)
try:
return round(geodesic(lat_long, lat_long_compare).kilometers,2)
except:
return 9999
for key, value in d.items():
lat_compare = value['latitude']
long_compare = value['longitude']
lat_long_compare = (lat_compare, long_compare)
csr = key
df[key] = df.apply([distance, csr], axis=1)
一些示例数据可以是:
destinations = { 'bigben' : {'latitude': 51.510357,
'longitude': -0.116773},
'heathrow' : {'latitude': 51.470020,
'longitude': -0.454295},
'alton_towers' : {'latitude': 52.987662716,
'longitude': -1.888829778}
}
bigben距离伦敦眼0.8KM
伦敦眼距离希思罗机场23.55KM
alton_towers距离伦敦眼204.63KM
所以,在这种情况下,该字段应该只显示大笨钟。
所以我们得到:
网站 | 5KM以内的站点
28、大本钟
这是 NearestNeighbors 的一种方式。
from sklearn.neighbors import NearestNeighbors
# data from your input
df = pd.DataFrame.from_dict(destinations, orient='index').rename_axis('Site Name').reset_index()
radius = 50 #change to whatever, in km
# crate the algo with the raidus and the metric for geospatial distance
neigh = NearestNeighbors(radius=radius/6371, metric='haversine')
# fit the data in radians
neigh.fit(df[['latitude', 'longitude']].to_numpy()*np.pi/180)
# extract result and transform to get the expected output
df[f'Site_within_{radius}km'] = (
pd.Series(neigh.radius_neighbors()[1]) # get a list of index for each row
.explode()
.map(df['Site Name']) # get the site name from row index
.groupby(level=0) # transform back to row-row relation
.agg(list) # can use ', '.join instead of list
)
print(df)
Site Name latitude longitude Site_within_50km
0 bigben 51.510357 -0.116773 [heathrow]
1 heathrow 51.470020 -0.454295 [bigben]
2 alton_towers 52.987663 -1.888830 [nan]
另一种方式
from sklearn.neighbors import DistanceMetric
from math import radians
import pandas as pd
import numpy as np
#To Radians
df['latitude'] = np.radians(df['latitude'])
df['longitude'] = np.radians(df['longitude'])
#Pair the cities
df[['latitude','longitude']].to_numpy()
#Assume a sperical radius of 6373
dist = DistanceMetric.get_metric('haversine')#DistanceMetric class
df=pd.DataFrame(dist.pairwise(df[['latitude','longitude']].to_numpy())*6373,columns=df.index.unique(), index=df.index.unique())
s=df.gt(0)&df.le(50)
df['Site_within_50km']=s.agg(lambda x: x.index[x].values, axis=1)#Filter
bigben heathrow alton_towers Site_within_50km
bigben 0.000000 23.802459 203.857533 [heathrow]
heathrow 23.802459 0.000000 195.048961 [bigben]
alton_towers 203.857533 195.048961 0.000000 []
我有点卡住了。我有一个 CSV,其中包括:
站点名称 纬度 经度。
此 CSV 包含 100,000 个位置。我需要为每个位置生成一个逗号分隔列表,显示 5KM
内的其他位置我已经尝试了附件,它转置了 table 并为我提供了 100,000 列和 100,000 行以及作为结果填充的距离。但我不确定如何制作一个新的 pandas 列,其中包含 5KM 以内的所有站点列表。
你能帮忙吗?
from geopy.distance import geodesic
def distance(row, csr):
lat = row['latitude']
long = row['longitude']
lat_long = (lat, long)
try:
return round(geodesic(lat_long, lat_long_compare).kilometers,2)
except:
return 9999
for key, value in d.items():
lat_compare = value['latitude']
long_compare = value['longitude']
lat_long_compare = (lat_compare, long_compare)
csr = key
df[key] = df.apply([distance, csr], axis=1)
一些示例数据可以是:
destinations = { 'bigben' : {'latitude': 51.510357,
'longitude': -0.116773},
'heathrow' : {'latitude': 51.470020,
'longitude': -0.454295},
'alton_towers' : {'latitude': 52.987662716,
'longitude': -1.888829778}
}
bigben距离伦敦眼0.8KM 伦敦眼距离希思罗机场23.55KM alton_towers距离伦敦眼204.63KM
所以,在这种情况下,该字段应该只显示大笨钟。
所以我们得到:
网站 | 5KM以内的站点 28、大本钟
这是 NearestNeighbors 的一种方式。
from sklearn.neighbors import NearestNeighbors
# data from your input
df = pd.DataFrame.from_dict(destinations, orient='index').rename_axis('Site Name').reset_index()
radius = 50 #change to whatever, in km
# crate the algo with the raidus and the metric for geospatial distance
neigh = NearestNeighbors(radius=radius/6371, metric='haversine')
# fit the data in radians
neigh.fit(df[['latitude', 'longitude']].to_numpy()*np.pi/180)
# extract result and transform to get the expected output
df[f'Site_within_{radius}km'] = (
pd.Series(neigh.radius_neighbors()[1]) # get a list of index for each row
.explode()
.map(df['Site Name']) # get the site name from row index
.groupby(level=0) # transform back to row-row relation
.agg(list) # can use ', '.join instead of list
)
print(df)
Site Name latitude longitude Site_within_50km
0 bigben 51.510357 -0.116773 [heathrow]
1 heathrow 51.470020 -0.454295 [bigben]
2 alton_towers 52.987663 -1.888830 [nan]
另一种方式
from sklearn.neighbors import DistanceMetric
from math import radians
import pandas as pd
import numpy as np
#To Radians
df['latitude'] = np.radians(df['latitude'])
df['longitude'] = np.radians(df['longitude'])
#Pair the cities
df[['latitude','longitude']].to_numpy()
#Assume a sperical radius of 6373
dist = DistanceMetric.get_metric('haversine')#DistanceMetric class
df=pd.DataFrame(dist.pairwise(df[['latitude','longitude']].to_numpy())*6373,columns=df.index.unique(), index=df.index.unique())
s=df.gt(0)&df.le(50)
df['Site_within_50km']=s.agg(lambda x: x.index[x].values, axis=1)#Filter
bigben heathrow alton_towers Site_within_50km
bigben 0.000000 23.802459 203.857533 [heathrow]
heathrow 23.802459 0.000000 195.048961 [bigben]
alton_towers 203.857533 195.048961 0.000000 []