具有纬度和经度的点集合之间的成对距离
Pairwise distance between collections of points with latitude and longitude
我有两组点及其纬度和经度,我想计算它们之间的成对距离。这适用于两个列表较小的情况:
from geopy.distance import distance
c1 = [(-34.7102, -58.3853),
(-32.9406, -60.7136),
(-34.6001, -58.3729),
(-38.9412, -67.9948),
(-35.1871, -59.0968)]
c2 = [(-43.2568, -65.2853),
(-31.4038, -64.1645),
(-34.7634, -58.2120),
(-34.4819, -58.5828),
(-34.5669, -58.4515),
(-34.6356, -68.369),
(-34.4048, -58.6896)]
distances = []
for c in c1:
this_row = [distance(c, x).meters for x in c2]
distances.append(this_row)
然而,c1
和c2
的实际长度分别为50000和15000。当我 运行 上面的脚本和我的真实数据时,它需要很长时间。我正在寻找高效的东西,例如
distances = scipy.spatial.distance.cdist(c1, c2)
这非常快,但是函数 returns 的结果在一个未指定的单位中,据我所知。我正在寻找以米为单位的结果。
有什么方法可以更有效地重写第一个脚本吗?
我考虑了一些选择。这是我学到的,希望对您有所帮助:
scipy.distance.cdist
:
它似乎接受一个可调用的 metric
参数,但我认为自定义函数也会使事情变慢。
scikitlearn.neighbors.DistanceMetric
:
它有一个内置的 haversine
指标。
不管怎样,我没能很好地理解如何让事情正常进行,但我相信你会找到办法的。此外,他们声称,对于许多指标,DistanceMetric.pairwise
将比 scipy.cdist
.
慢
投影:
我找到的唯一可接受的解决方案暗示了像 aeqd of your coordinates on a 2D plane (I'm going to use pyproj
这样的投影。
这允许您在投影点上使用 scipy.cdist
并获得更快的速度,但是对于距离用作 aeqd
投影参考的 lat_0, lon_0
坐标太远的对,它会变得不太精确(可能是不同的投影或一些解决方法可以解决这个问题)。
我发布了您的循环和投影的结果以进行比较。
代码:
import numpy as np
import pyproj
import scipy
from geopy.distance import distance
c1 = np.array(
[(-34.7102, -58.3853),
(-32.9406, -60.7136),
(-34.6001, -58.3729),
(-38.9412, -67.9948),
(-35.1871, -59.0968)]
)
c2 = np.array(
[(-43.2568, -65.2853),
(-31.4038, -64.1645),
(-34.7634, -58.2120),
(-34.4819, -58.5828),
(-34.5669, -58.4515),
(-34.6356, -68.369),
(-34.4048, -58.6896)]
)
# create projections, using a mean (lat, lon) for aeqd
lat_0, lon_0 = np.mean(np.append(c1[:,0], c2[:,0])), np.mean(np.append(c1[:,1], c2[:,1]))
proj = pyproj.Proj(proj='aeqd', lat_0=lat_0, lon_0=lon_0, x_0=lon_0, y_0=lat_0)
WGS84 = pyproj.Proj(init='epsg:4326')
# transform coordinates
projected_c1 = pyproj.transform(WGS84, proj, c1[:,1], c1[:,0])
projected_c2 = pyproj.transform(WGS84, proj, c2[:,1], c2[:,0])
projected_c1 = np.column_stack(projected_c1)
projected_c2 = np.column_stack(projected_c2)
# calculate pairwise distances in km with both methods
sc_dist = scipy.spatial.distance.cdist(projected_c1, projected_c2)
geo_distances = []
for c in c1:
this_row = [distance(c, x).km for x in c2]
geo_distances.append(this_row)
print("scipy\n")
print(sc_dist/1000)
print("\n")
print("geopy\n")
print(np.array(geo_distances))
输出:
scipy
[[1120.68384362 652.43817992 16.93436992 31.1480337 17.02161533
914.68158465 43.91751967]
[1212.75267066 367.46344647 307.41739698 261.2734859 276.57111944
733.44881488 248.25303017]
[1131.82744423 646.91757042 23.36452322 23.31086804 8.09877062
916.39849619 36.27486327]
[ 531.58906215 906.44775882 987.23837525 974.96389103 979.98229079
479.75111318 971.51078808]
[1042.57374645 631.42752409 93.47695658 91.28419725 90.64134205
849.25121659 94.46063802]]
geopy
[[1120.50400287 652.32406273 16.93254254 31.1392657 17.01619952
914.66757909 43.9058496 ]
[1212.7494454 367.3591636 307.3468806 261.21313155 276.50708156
733.28119124 248.19563872]
[1131.65345927 646.79571942 23.35783766 23.30613446 8.09745879
916.38027748 36.26700778]
[ 530.49964531 905.85826336 987.20594883 974.95078113 979.96382386
478.97343089 971.50158032]
[1042.44765568 631.37206038 93.47402012 91.2737422 90.63359193
849.24940173 94.44779778]]
cdist
支持自定义距离函数,可以这样传:
from scipy.spatial.distance import cdist
from geopy.distance import distance as geodist # avoid naming confusion
sc_dist = cdist(c1, c2, lambda u, v: geodist(u, v).meters) # you can choose unit here
虽然我不确定性能。
我有两组点及其纬度和经度,我想计算它们之间的成对距离。这适用于两个列表较小的情况:
from geopy.distance import distance
c1 = [(-34.7102, -58.3853),
(-32.9406, -60.7136),
(-34.6001, -58.3729),
(-38.9412, -67.9948),
(-35.1871, -59.0968)]
c2 = [(-43.2568, -65.2853),
(-31.4038, -64.1645),
(-34.7634, -58.2120),
(-34.4819, -58.5828),
(-34.5669, -58.4515),
(-34.6356, -68.369),
(-34.4048, -58.6896)]
distances = []
for c in c1:
this_row = [distance(c, x).meters for x in c2]
distances.append(this_row)
然而,c1
和c2
的实际长度分别为50000和15000。当我 运行 上面的脚本和我的真实数据时,它需要很长时间。我正在寻找高效的东西,例如
distances = scipy.spatial.distance.cdist(c1, c2)
这非常快,但是函数 returns 的结果在一个未指定的单位中,据我所知。我正在寻找以米为单位的结果。
有什么方法可以更有效地重写第一个脚本吗?
我考虑了一些选择。这是我学到的,希望对您有所帮助:
scipy.distance.cdist
:
它似乎接受一个可调用的 metric
参数,但我认为自定义函数也会使事情变慢。
scikitlearn.neighbors.DistanceMetric
:
它有一个内置的 haversine
指标。
不管怎样,我没能很好地理解如何让事情正常进行,但我相信你会找到办法的。此外,他们声称,对于许多指标,DistanceMetric.pairwise
将比 scipy.cdist
.
投影:
我找到的唯一可接受的解决方案暗示了像 aeqd of your coordinates on a 2D plane (I'm going to use pyproj
这样的投影。
这允许您在投影点上使用 scipy.cdist
并获得更快的速度,但是对于距离用作 aeqd
投影参考的 lat_0, lon_0
坐标太远的对,它会变得不太精确(可能是不同的投影或一些解决方法可以解决这个问题)。
我发布了您的循环和投影的结果以进行比较。
代码:
import numpy as np
import pyproj
import scipy
from geopy.distance import distance
c1 = np.array(
[(-34.7102, -58.3853),
(-32.9406, -60.7136),
(-34.6001, -58.3729),
(-38.9412, -67.9948),
(-35.1871, -59.0968)]
)
c2 = np.array(
[(-43.2568, -65.2853),
(-31.4038, -64.1645),
(-34.7634, -58.2120),
(-34.4819, -58.5828),
(-34.5669, -58.4515),
(-34.6356, -68.369),
(-34.4048, -58.6896)]
)
# create projections, using a mean (lat, lon) for aeqd
lat_0, lon_0 = np.mean(np.append(c1[:,0], c2[:,0])), np.mean(np.append(c1[:,1], c2[:,1]))
proj = pyproj.Proj(proj='aeqd', lat_0=lat_0, lon_0=lon_0, x_0=lon_0, y_0=lat_0)
WGS84 = pyproj.Proj(init='epsg:4326')
# transform coordinates
projected_c1 = pyproj.transform(WGS84, proj, c1[:,1], c1[:,0])
projected_c2 = pyproj.transform(WGS84, proj, c2[:,1], c2[:,0])
projected_c1 = np.column_stack(projected_c1)
projected_c2 = np.column_stack(projected_c2)
# calculate pairwise distances in km with both methods
sc_dist = scipy.spatial.distance.cdist(projected_c1, projected_c2)
geo_distances = []
for c in c1:
this_row = [distance(c, x).km for x in c2]
geo_distances.append(this_row)
print("scipy\n")
print(sc_dist/1000)
print("\n")
print("geopy\n")
print(np.array(geo_distances))
输出:
scipy
[[1120.68384362 652.43817992 16.93436992 31.1480337 17.02161533
914.68158465 43.91751967]
[1212.75267066 367.46344647 307.41739698 261.2734859 276.57111944
733.44881488 248.25303017]
[1131.82744423 646.91757042 23.36452322 23.31086804 8.09877062
916.39849619 36.27486327]
[ 531.58906215 906.44775882 987.23837525 974.96389103 979.98229079
479.75111318 971.51078808]
[1042.57374645 631.42752409 93.47695658 91.28419725 90.64134205
849.25121659 94.46063802]]
geopy
[[1120.50400287 652.32406273 16.93254254 31.1392657 17.01619952
914.66757909 43.9058496 ]
[1212.7494454 367.3591636 307.3468806 261.21313155 276.50708156
733.28119124 248.19563872]
[1131.65345927 646.79571942 23.35783766 23.30613446 8.09745879
916.38027748 36.26700778]
[ 530.49964531 905.85826336 987.20594883 974.95078113 979.96382386
478.97343089 971.50158032]
[1042.44765568 631.37206038 93.47402012 91.2737422 90.63359193
849.24940173 94.44779778]]
cdist
支持自定义距离函数,可以这样传:
from scipy.spatial.distance import cdist
from geopy.distance import distance as geodist # avoid naming confusion
sc_dist = cdist(c1, c2, lambda u, v: geodist(u, v).meters) # you can choose unit here
虽然我不确定性能。