Python: 使用两列计算两点坐标之间的距离
Python: Computing the distance between two point coordinates using two columns
我想计算两个坐标之间的距离。我知道我可以计算两点之间的半正弦距离。但是,我想知道是否有更简单的方法来代替使用迭代整个列的公式创建循环(也会在循环中出错)。
这是示例的一些数据
# Random values for the duration from one point to another
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
{'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
})
df
duration latitude longitude
0 5 11.923855 57.723843
1 2 11.923862 57.723831
2 10 11.923851 57.723839
3 19 11.923847 57.723831
4 16 11.923865 57.723827
5 4 11.923841 57.723831
6 13 11.923860 57.723835
7 3 11.923846 57.723827
为了计算距离,这是我尝试过的方法:
# Looping over each row to compute the Haversine distance between two points
# Earth's radius (in m)
R = 6373.0 * 1000
lat = df["latitude"]
lon = df["longitude"]
for i in lat:
lat1 = lat[i]
lat2 = lat[i+1]
for j in lon:
lon1 = lon[i]
lon2 = lon[i+1]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m
但是,这是我得到的错误:
计算距离的两点应该取自同一列。
第一个距离值:
11.923855 57.723843 (point1/observation1)
11.923862 57.723831 (point2/observation2)
秒距离值:
11.923862 57.723831 (point1/observation2)
11.923851 57.723839(point2/observation3)
第三个距离值:
11.923851 57.723839(point1/observation3)
11.923847 57.723831 (point1/observation4)
...(等等)
我了解到您想获得 df 中所有点之间的成对正弦距离。这是如何完成的:
对很多点使用这种方法时要小心,因为它会很快生成很多列
设置
import random
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
{'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
})
获取弧度
import math
df['lat_rad'] = df.latitude.apply(math.radians)
df['long_rad'] = df.latitude.apply(math.radians)
计算成对距离
from sklearn.metrics.pairwise import haversine_distances
for idx_from, from_point in df.iterrows():
for idx_to, to_point in df.iterrows():
column_name = f"Distance_to_point_{idx_from}"
haversine_matrix = haversine_distances([[from_point.lat_rad, from_point.long_rad], [to_point.lat_rad, to_point.long_rad]])
point_distance = haversine_matrix[0][1] * 6371000/1000
df.loc[idx_to, column_name] = point_distance
df
duration latitude longitude lat_rad long_rad Distance_to_point_0 Distance_to_point_1 Distance_to_point_2 Distance_to_point_3 Distance_to_point_4 Distance_to_point_5 Distance_to_point_6 Distance_to_point_7
0 3 11.923855 57.723843 0.20811052928038845 0.20811052928038845 0.0 0.0010889626934743966 0.0006222644021223135 0.001244528808978787 0.0015556609862946524 0.002177925427923575 0.000777830496776312 0.0014000949117650525
1 13 11.923862 57.723831 0.2081106514534361 0.2081106514534361 0.0010889626934743966 0.0 0.0017112270955967099 0.002333491502453183 0.0004666982928202561 0.00326688812139797 0.00031113219669808446 0.0024890576052394482
2 14 11.923851 57.723839 0.2081104594672184 0.2081104594672184 0.0006222644021223135 0.0017112270955967099 0.0 0.0006222644068564735 0.002177925388416966 0.0015556610258012616 0.0014000948988986254 0.0007778305096427389
3 4 11.923847 57.723831 0.20811038965404832 0.20811038965404832 0.001244528808978787 0.002333491502453183 0.0006222644068564735 0.0 0.0028001897952734385 0.0009333966189447881 0.002022359305755099 0.0001555661027862654
4 5 11.923865 57.723827 0.20811070381331365 0.20811070381331365 0.0015556609862946524 0.0004666982928202561 0.002177925388416966 0.0028001897952734385 0.0 0.003733586414218225 0.0007778304895183407 0.002955755898059704
5 7 11.923841 57.723831 0.20811028493429318 0.20811028493429318 0.002177925427923575 0.00326688812139797 0.0015556610258012616 0.0009333966189447881 0.003733586414218225 0.0 0.002955755924699886 0.0007778305161585227
6 9 11.92386 57.723835 0.20811061654685106 0.20811061654685106 0.000777830496776312 0.00031113219669808446 0.0014000948988986254 0.002022359305755099 0.0007778304895183407 0.002955755924699886 0.0 0.002177925408541364
7 8 11.923846 57.723827 0.20811037220075576 0.20811037220075576 0.0014000949117650525 0.0024890576052394482 0.0007778305096427389 0.0001555661027862654 0.002955755898059704 0.0007778305161585227 0.002177925408541364 0.0
好的,首先您可以创建一个数据框,将每个测量值与前一个测量值相结合:
df2 = pd.concat([df.add_suffix('_pre').shift(), df], axis=1)
df2
这输出:
duration_pre latitude_pre longitude_pre duration latitude longitude
0 NaN NaN NaN 5 11.923855 57.723843
1 5.0 11.923855 57.723843 2 11.923862 57.723831
2 2.0 11.923862 57.723831 10 11.923851 57.723839
…
然后创建一个 haversine
函数并将其应用于行:
def haversine(lat1, lon1, lat2, lon2):
import math
R = 6373.0 * 1000
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
return R *2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
df2.apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
计算每一行与前一行的距离(因此第一行是 NaN)。
0 NaN
1 75.754755
2 81.120210
3 48.123604
…
并且,如果您想在一行中包含原始数据框中的新列:
df['distance'] = pd.concat([df.add_suffix('_pre').shift(), df], axis=1).apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
输出:
duration latitude longitude distance
0 5 11.923855 57.723843 NaN
1 2 11.923862 57.723831 75.754755
2 10 11.923851 57.723839 81.120210
3 19 11.923847 57.723831 48.123604
4 16 11.923865 57.723827 116.515304
5 4 11.923841 57.723831 154.307571
6 13 11.923860 57.723835 122.794838
7 3 11.923846 57.723827 98.115312
您混淆了索引与值本身,因此您遇到了一个关键错误,因为您的示例中没有 lat[i](例如 lat[11.923855])。将 i 固定为索引后,您的代码将使用 [i+1] 超出纬度和经度的最后一行。既然你想将每一行与前一行进行比较,那么从索引 1 开始并按 1 向后看如何,那么你就不会超出范围。您的代码的这个编辑版本不会崩溃:
for i in range(1, len(lat)):
lat1 = lat[i - 1]
lat2 = lat[i]
for j in range(1, len(lon)):
lon1 = lon[i - 1]
lon2 = lon[i]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m
我想计算两个坐标之间的距离。我知道我可以计算两点之间的半正弦距离。但是,我想知道是否有更简单的方法来代替使用迭代整个列的公式创建循环(也会在循环中出错)。
这是示例的一些数据
# Random values for the duration from one point to another
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
{'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
})
df
duration latitude longitude
0 5 11.923855 57.723843
1 2 11.923862 57.723831
2 10 11.923851 57.723839
3 19 11.923847 57.723831
4 16 11.923865 57.723827
5 4 11.923841 57.723831
6 13 11.923860 57.723835
7 3 11.923846 57.723827
为了计算距离,这是我尝试过的方法:
# Looping over each row to compute the Haversine distance between two points
# Earth's radius (in m)
R = 6373.0 * 1000
lat = df["latitude"]
lon = df["longitude"]
for i in lat:
lat1 = lat[i]
lat2 = lat[i+1]
for j in lon:
lon1 = lon[i]
lon2 = lon[i+1]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m
但是,这是我得到的错误:
计算距离的两点应该取自同一列。
第一个距离值:
11.923855 57.723843 (point1/observation1)
11.923862 57.723831 (point2/observation2)
秒距离值:
11.923862 57.723831 (point1/observation2)
11.923851 57.723839(point2/observation3)
第三个距离值:
11.923851 57.723839(point1/observation3)
11.923847 57.723831 (point1/observation4)
...(等等)
我了解到您想获得 df 中所有点之间的成对正弦距离。这是如何完成的:
对很多点使用这种方法时要小心,因为它会很快生成很多列
设置
import random
random_values = random.sample(range(2,20), 8)
random_values
# Creating arrays for the coordinates
lat_coor = [11.923855, 11.923862, 11.923851, 11.923847, 11.923865, 11.923841, 11.923860, 11.923846]
lon_coor = [57.723843, 57.723831, 57.723839, 57.723831, 57.723827, 57.723831, 57.723835, 57.723827]
df = pd.DataFrame(
{'duration': random_values,
'latitude': lat_coor,
'longitude': lon_coor
})
获取弧度
import math
df['lat_rad'] = df.latitude.apply(math.radians)
df['long_rad'] = df.latitude.apply(math.radians)
计算成对距离
from sklearn.metrics.pairwise import haversine_distances
for idx_from, from_point in df.iterrows():
for idx_to, to_point in df.iterrows():
column_name = f"Distance_to_point_{idx_from}"
haversine_matrix = haversine_distances([[from_point.lat_rad, from_point.long_rad], [to_point.lat_rad, to_point.long_rad]])
point_distance = haversine_matrix[0][1] * 6371000/1000
df.loc[idx_to, column_name] = point_distance
df
duration latitude longitude lat_rad long_rad Distance_to_point_0 Distance_to_point_1 Distance_to_point_2 Distance_to_point_3 Distance_to_point_4 Distance_to_point_5 Distance_to_point_6 Distance_to_point_7
0 3 11.923855 57.723843 0.20811052928038845 0.20811052928038845 0.0 0.0010889626934743966 0.0006222644021223135 0.001244528808978787 0.0015556609862946524 0.002177925427923575 0.000777830496776312 0.0014000949117650525
1 13 11.923862 57.723831 0.2081106514534361 0.2081106514534361 0.0010889626934743966 0.0 0.0017112270955967099 0.002333491502453183 0.0004666982928202561 0.00326688812139797 0.00031113219669808446 0.0024890576052394482
2 14 11.923851 57.723839 0.2081104594672184 0.2081104594672184 0.0006222644021223135 0.0017112270955967099 0.0 0.0006222644068564735 0.002177925388416966 0.0015556610258012616 0.0014000948988986254 0.0007778305096427389
3 4 11.923847 57.723831 0.20811038965404832 0.20811038965404832 0.001244528808978787 0.002333491502453183 0.0006222644068564735 0.0 0.0028001897952734385 0.0009333966189447881 0.002022359305755099 0.0001555661027862654
4 5 11.923865 57.723827 0.20811070381331365 0.20811070381331365 0.0015556609862946524 0.0004666982928202561 0.002177925388416966 0.0028001897952734385 0.0 0.003733586414218225 0.0007778304895183407 0.002955755898059704
5 7 11.923841 57.723831 0.20811028493429318 0.20811028493429318 0.002177925427923575 0.00326688812139797 0.0015556610258012616 0.0009333966189447881 0.003733586414218225 0.0 0.002955755924699886 0.0007778305161585227
6 9 11.92386 57.723835 0.20811061654685106 0.20811061654685106 0.000777830496776312 0.00031113219669808446 0.0014000948988986254 0.002022359305755099 0.0007778304895183407 0.002955755924699886 0.0 0.002177925408541364
7 8 11.923846 57.723827 0.20811037220075576 0.20811037220075576 0.0014000949117650525 0.0024890576052394482 0.0007778305096427389 0.0001555661027862654 0.002955755898059704 0.0007778305161585227 0.002177925408541364 0.0
好的,首先您可以创建一个数据框,将每个测量值与前一个测量值相结合:
df2 = pd.concat([df.add_suffix('_pre').shift(), df], axis=1)
df2
这输出:
duration_pre latitude_pre longitude_pre duration latitude longitude
0 NaN NaN NaN 5 11.923855 57.723843
1 5.0 11.923855 57.723843 2 11.923862 57.723831
2 2.0 11.923862 57.723831 10 11.923851 57.723839
…
然后创建一个 haversine
函数并将其应用于行:
def haversine(lat1, lon1, lat2, lon2):
import math
R = 6373.0 * 1000
dlon = lon2 - lon1
dlat = lat2 - lat1
a = math.sin(dlat / 2)**2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2)**2
return R *2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
df2.apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
计算每一行与前一行的距离(因此第一行是 NaN)。
0 NaN
1 75.754755
2 81.120210
3 48.123604
…
并且,如果您想在一行中包含原始数据框中的新列:
df['distance'] = pd.concat([df.add_suffix('_pre').shift(), df], axis=1).apply(lambda x: haversine(x['latitude_pre'], x['longitude_pre'], x['latitude'], x['longitude']), axis=1)
输出:
duration latitude longitude distance
0 5 11.923855 57.723843 NaN
1 2 11.923862 57.723831 75.754755
2 10 11.923851 57.723839 81.120210
3 19 11.923847 57.723831 48.123604
4 16 11.923865 57.723827 116.515304
5 4 11.923841 57.723831 154.307571
6 13 11.923860 57.723835 122.794838
7 3 11.923846 57.723827 98.115312
您混淆了索引与值本身,因此您遇到了一个关键错误,因为您的示例中没有 lat[i](例如 lat[11.923855])。将 i 固定为索引后,您的代码将使用 [i+1] 超出纬度和经度的最后一行。既然你想将每一行与前一行进行比较,那么从索引 1 开始并按 1 向后看如何,那么你就不会超出范围。您的代码的这个编辑版本不会崩溃:
for i in range(1, len(lat)):
lat1 = lat[i - 1]
lat2 = lat[i]
for j in range(1, len(lon)):
lon1 = lon[i - 1]
lon2 = lon[i]
dlon = lon2 - lon1
dlat = lat2 - lat1
# Haversine formula
a = math.sin(dlat / 2) ** 2 + math.cos(lat1) * math.cos(lat2) * math.sin(dlon / 2) ** 2
c = 2 * math.atan2(math.sqrt(a), math.sqrt(1 - a))
distance = R * c
print(distance) # in m