如何在 pandas 数据帧上使用 haversine 库使用 haversine 距离

How to use haversine distance using haversine library on pandas dataframe

这里使用我如何使用haversine库来计算两点之间的距离

import haversine as hs
hs.haversine((106.11333888888888,-1.94091666666667),(96.698661, 5.204783))

以下是使用 sklearn

计算半正弦距离的方法
from sklearn.metrics.pairwise import haversine_distances
import numpy as np
radian_1 = np.radians(df1[['lat','lon']])
radian_2 = np.radians(df2[['lat','lon']])
D = pd.DataFrame(haversine_distances(radian_1,radian_2)*6371,index=df1.index, columns=df2.index)

我需要做的是做类似的事情,但我使用 sklearn.metrics.pairwise 库,我使用 haversine

这是我的数据集df1

   index       lon        lat
0   0   107.071969  -6.347778
1   1   110.431361  -7.773489
2   2   111.978469  -8.065442

和数据集df2

    index      lon        lat
5   5   112.340919  -7.520442
6   6   107.179119  -6.291131
7   7   106.807442  -6.437383

这是预期的输出

        5           6           7
    0  596.019968   13.413123   30.882602
    1  212.317223  394.942014  426.564799
    2   72.573637  565.020998  598.409848

遵循以下文档和示例:sklearn.metrics.haversine

result = haversine_distances(np.radians(df_1[["lat","lon"]]), np.radians(df_2[["lat", "lon"]])) * 6371000/1000
result_df = pd.DataFrame(result, index = df_1["index"], columns=df_2["index"])

<table border="1" class="dataframe">
  <thead>
    <tr style="text-align: right;">
      <th>index</th>
      <th>5</th>
      <th>6</th>
      <th>7</th> </tr>
    <tr>
      <th>index</th>
      <th></th>
      <th></th>
      <th></th> </tr> </thead>
  <tbody>
    <tr>
      <th>0</th>
      <td>596.019968</td>
      <td>13.413123</td>
      <td>30.882602</td> </tr>
    <tr>
      <th>1</th>
      <td>212.317223</td>
      <td>394.942014</td>
      <td>426.564799</td> </tr>
    <tr>
      <th>2</th>
      <td>72.573637</td>
      <td>565.020998</td>
      <td>598.409848</td> </tr> </tbody> </table>

您首先需要将纬度和经度转换为弧度,一旦得到距离,您需要乘以地球半径才能得到正确的距离。

您可以使用 itertools.product 创建所有案例,然后使用 haversine 获得如下结果:

import haversine as hs
import pandas as pd
import numpy as np
import itertools

res = []
for a,b in (itertools.product(*[df1.values , df2.values])):
    res.append(hs.haversine(a,b))

m = int(np.sqrt(len(res)))
df = pd.DataFrame(np.asarray(res).reshape(m,m))
print(df)

输出:

            0           1           2
0  587.500555   12.058061   29.557005
1  212.580742  365.487782  405.718803
2   46.333180  537.684789  578.072579