R - 查找两个数据帧中点之间的最短距离

R - Find shortest distance between points across two dataframes

我需要跨两个数据帧确定点之间的最短距离。

Dataframe biz 包含个体企业,包括它们的坐标:

biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"), 
lon = c(-3.276435,-4.175388,-4.181740,-3.821941), 
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))

biz
  name       lon      lat
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277

Dataframe city 包含市场城市,包括它们的地理坐标:

city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"), 
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861), 
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))

city
   name       lon      lat
1 cityA -4.758804 10.64002
2 cityB -3.243278 10.95790
3 cityC -3.062628 13.06950
4 cityD -2.356686 13.20363

对于 biz 中的每个商家,我需要确定哪个市场城市最近,并在新列中列出该市场城市的名称:

biz
  name       lon      lat     city
1 bizA -3.276435 11.96748
2 bizB -4.175388 12.19885
3 bizC -4.181740 13.04638
4 bizD -3.821941 11.84277

我知道我可以使用像 geosphere 这样的包来测量 bizAcityA 坐标之间的距离。我正在努力解决:如何将 bizA 与每个城市进行比较,最小化距离,然后在数据框中列出最近的城市 biz.

非常感谢任何想法!

您可以使用 st_nearest_feature 来自 sf:

cbind(
  biz,
  nearest_city = city[
    st_nearest_feature(
      st_as_sf(biz, coords = c("lon", "lat"), crs = 4326), 
      st_as_sf(city, coords = c("lon", "lat"), crs = 4326)
    ),
  ]$name
)

although coordinates are longitude/latitude, st_nearest_feature assumes that they are planar
  name       lon      lat nearest_city
1 bizA -3.276435 11.96748     cityB
2 bizB -4.175388 12.19885     cityC
3 bizC -4.181740 13.04638     cityC
4 bizD -3.821941 11.84277     cityB

我想有多种方法可以做到这一点。 这是一个,首先使用 here 中的 dfcombos 函数从两个数据框中创建行的所有组合。 (我认为 CRAN 上的包中有一些替代品。)

这里的距离只是一个随机数,为了演示。

使用 order 排序后,使用 duplicated 选择最近的城市。 这种方法也有替代方法,但它看起来很简单。

source('dfcombos.R')

biz <- structure(list(name = c("bizA", "bizB", "bizC", "bizD"), 
lon = c(-3.276435,-4.175388,-4.181740,-3.821941), 
lat = c(11.96748,12.19885,13.04638,11.84277)),
class = "data.frame",row.names = c(NA, -4L))

city <- structure(list(name = c("cityA", "cityB", "cityC", "cityD"), 
lon = c(-4.7588042,-3.2432781,-3.0626284,-2.3566861), 
lat = c(10.64002,10.95790,13.06950,13.20363)),
class = "data.frame",row.names = c(NA, -4L))

comb <- dfcombos(biz, city)

comb$dist <- runif(nrow(comb))

comb <- comb[order(comb$dist), ]

closest <- comb[!duplicated(comb$name), ]