根据地理坐标之间的距离将行从一个数据框添加到另一个数据框中
Add rows from one dataframe to another based on distances between their geographic coordinates
我和这里有类似的问题
两个数据框:
df1 = pd.DataFrame({'id': [1,2,3],
'lat':[-23.48, -22.94, -23.22],
'long':[-46.36, -45.40, -45.80]})
df2 = pd.DataFrame({'id': [100,200,300],
'lat':[-28.48, -22.94, -23.22],
'long':[-46.36, -46.40, -45.80]})
我的问题是:使用 Ben.T 那里建议的解决方案,如果 df2 的点不在 df 附近,我如何将 df2 的行添加到 df1?我认为,基于距离矩阵:
from sklearn.metrics.pairwise import haversine_distances
# variable in meter you can change
threshold = 100 # meters
# another parameter
earth_radius = 6371000 # meters
distance_matrix = (
# get the distance between all points of each DF
haversine_distances(
# note that you need to convert to radiant with *np.pi/180
X=df1[['lat','long']].to_numpy()*np.pi/180,
Y=df2[['lat','long']].to_numpy()*np.pi/180)
# get the distance in meter
*earth_radius
# compare to your threshold
< threshold
# **here I want to add rows from df2 to df1 if point from df2 is NOT near df1**
)
例如输出如下所示:
输出:
id lat long
1 -23.48 -46.36
2 -22.94 -45.40
3 -23.22 -45.80
4 -28.48 -46.36
5 -22.94 -46.40
距离矩阵为您提供了一个 (len(df1), len(df2))
布尔数组,True 表示它们“接近”。您可以通过用 any
跨轴 0:
汇总矩阵来确定 df1 中的 any 点是否足够接近 df2 中的每个元素
In [33]: df2_has_close_point_in_df1 = distance_matrix.any(axis=0)
In [34]: df2_has_close_point_in_df1
Out[34]: array([False, False, True])
然后您可以将其用作过滤 df2 的掩码。使用按位否定运算符 ~
反转 True/False 值(仅获取 df2
中的行 not close:
In [35]: df2.iloc[~df2_has_close_point_in_df1]
Out[35]:
id lat long
0 100 -28.48 -46.36
1 200 -22.94 -46.40
这现在可以与 df1 结合以获得组合数据集:
In [36]: combined = pd.concat([df1, df2.iloc[~df2_has_close_point_in_df1]], axis=0)
In [37]: combined
Out[37]:
id lat long
0 1 -23.48 -46.36
1 2 -22.94 -45.40
2 3 -23.22 -45.80
0 100 -28.48 -46.36
1 200 -22.94 -46.40
我和这里有类似的问题
df1 = pd.DataFrame({'id': [1,2,3],
'lat':[-23.48, -22.94, -23.22],
'long':[-46.36, -45.40, -45.80]})
df2 = pd.DataFrame({'id': [100,200,300],
'lat':[-28.48, -22.94, -23.22],
'long':[-46.36, -46.40, -45.80]})
我的问题是:使用 Ben.T 那里建议的解决方案,如果 df2 的点不在 df 附近,我如何将 df2 的行添加到 df1?我认为,基于距离矩阵:
from sklearn.metrics.pairwise import haversine_distances
# variable in meter you can change
threshold = 100 # meters
# another parameter
earth_radius = 6371000 # meters
distance_matrix = (
# get the distance between all points of each DF
haversine_distances(
# note that you need to convert to radiant with *np.pi/180
X=df1[['lat','long']].to_numpy()*np.pi/180,
Y=df2[['lat','long']].to_numpy()*np.pi/180)
# get the distance in meter
*earth_radius
# compare to your threshold
< threshold
# **here I want to add rows from df2 to df1 if point from df2 is NOT near df1**
)
例如输出如下所示:
输出:
id lat long
1 -23.48 -46.36
2 -22.94 -45.40
3 -23.22 -45.80
4 -28.48 -46.36
5 -22.94 -46.40
距离矩阵为您提供了一个 (len(df1), len(df2))
布尔数组,True 表示它们“接近”。您可以通过用 any
跨轴 0:
In [33]: df2_has_close_point_in_df1 = distance_matrix.any(axis=0)
In [34]: df2_has_close_point_in_df1
Out[34]: array([False, False, True])
然后您可以将其用作过滤 df2 的掩码。使用按位否定运算符 ~
反转 True/False 值(仅获取 df2
中的行 not close:
In [35]: df2.iloc[~df2_has_close_point_in_df1]
Out[35]:
id lat long
0 100 -28.48 -46.36
1 200 -22.94 -46.40
这现在可以与 df1 结合以获得组合数据集:
In [36]: combined = pd.concat([df1, df2.iloc[~df2_has_close_point_in_df1]], axis=0)
In [37]: combined
Out[37]:
id lat long
0 1 -23.48 -46.36
1 2 -22.94 -45.40
2 3 -23.22 -45.80
0 100 -28.48 -46.36
1 200 -22.94 -46.40