left_join 基于 R 中最接近的 LAT_LON
left_join based on closest LAT_LON in R
我想参考我原来的 data.frame 在 data.frame 中找到最近的 LAT_LON 的 ID。我已经通过将 data.frame 合并到一个唯一标识符上并根据 geosphere
中的 distHaverSine
函数计算距离来解决这个问题。现在,我想更进一步,加入没有唯一标识符的 data.frames 并找到最近的 LAT-LON 的 ID。
合并后我使用了以下代码:
v3 <-v2 %>% mutate(CTD = distHaversine(cbind(LON.x, LAT.x), cbind(LON.y, LAT.y)))
数据:
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
作为最终结果,我想要这样的东西:
df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker
Street','Bank'),
stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'),
dist = c('x','x','x','x','x','x','x','x'),
lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
)
感谢任何帮助。谢谢。
由于对象之间的距离很小,我们可以加快计算速度
通过使用坐标之间的欧几里得距离。因为我们不在身边
赤道,lng坐标被压扁了一点;我们可以进行比较
通过稍微缩放 lng 稍微好一点。
cor_stop <- stop[, c("lat", "lng")]
cor_stop$lng <- cor_stop$lng * sin(mean(cor_stop$lat, na.rm = TRUE)/180*pi)
cor_loc <- loc[, c("lat", "lng")]
cor_loc$lng <- cor_loc$lng * sin(mean(cor_loc$lat, na.rm = TRUE)/180*pi)
然后我们可以使用 FNN
包计算每个位置最近的停靠点,该包使用基于树的搜索来快速找到最近的 K 个邻居。这应该扩展到大数据集(我已经将其用于具有数百万条记录的数据集):
library(FNN)
matches <- knnx.index(cor_stop, cor_loc, k = 1)
matches
## [,1]
## [1,] 4
## [2,] 2
然后我们可以构建最终结果:
res <- loc
res$stop_station <- stop$station[matches[,1]]
res$stop_lat <- stop$lat[matches[,1]]
res$stop_lng <- stop$lng[matches[,1]]
res$stop_postcode <- stop$postcode[matches[,1]]
并计算实际距离:
library(geosphere)
res$dist <- distHaversine(res[, c("lng", "lat")], res[, c("stop_lng", "stop_lat")])
res
## station lat lng postcode stop_station stop_lat stop_lng
## 1 Baker Street 51.52224 -0.15708000 NW1 Bayswater 51.51224 -0.187569
## 2 Bank 51.51340 -0.08905843 EC3V Barbican 51.52087 -0.097758
## stop_postcode dist
## 1 W2 2387.231
## 2 EC1A 1026.091
如果你不确定经纬度最近的点也是经纬度最近点'as the bird flies',你可以用这个方法先select经纬度最近的K个点;然后计算这些点的距离,然后 selecting 最近的点。
所有连接、距离计算和绘图都可以使用可用的 R 包完成。
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(nngeo)
library(mapview)
## Original data
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker
Street','Bank'),
stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'),
dist = c('x','x','x','x','x','x','x','x'),
lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
)
## Create sf objects from lat/lon points
loc_sf <- loc %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>%
st_set_crs(4326)
stop_sf <- stop %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>%
st_set_crs(4326)
# Use st_nearest_feature to cbind loc to stop by nearest points
joined_sf <- stop_sf %>%
cbind(
loc_sf[st_nearest_feature(stop_sf, loc_sf),])
## mutate to add column showing distance between geometries
joined_sf %>%
mutate(dist = st_distance(geometry, geometry.1, by_element = T))
#> Simple feature collection with 4 features and 5 fields
#> Active geometry column: geometry
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -0.21434 ymin: 51.49028 xmax: -0.097758 ymax: 51.53253
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> station postcode station.1 postcode.1 geometry
#> 1 Angel EC1V Bank EC3V POINT (-0.10579 51.53253)
#> 2 Barbican EC1A Bank EC3V POINT (-0.097758 51.52087)
#> 3 Barons Court W14 Baker Street NW1 POINT (-0.21434 51.49028)
#> 4 Bayswater W2 Baker Street NW1 POINT (-0.187569 51.51224)
#> geometry.1 dist
#> 1 POINT (-0.08905843 51.5134) 2424.102 [m]
#> 2 POINT (-0.08905843 51.5134) 1026.449 [m]
#> 3 POINT (-0.15708 51.52224) 5333.417 [m]
#> 4 POINT (-0.15708 51.52224) 2390.791 [m]
## Use nngeo and mapview to plot lines on a map
# NOT run for reprex, output image attached
#connected <- st_connect(stop_sf, loc_sf)
# mapview(connected) +
# mapview(loc_sf, color = 'red') +
# mapview(stop_sf, color = 'black')
由 reprex package (v0.3.0)
于 2020-01-21 创建
如果您能够使用投影坐标系,则可以完全避免搜索最近的邻居。如果可以,那么您可以在每个位置周围廉价地构建 Voronoi polygons - 这些多边形定义了最接近每个输入点的区域。
然后您可以只使用 GIS 交叉点来查找哪些点位于哪些多边形中,然后计算已知的最近点对的距离。我认为这应该快得多。但是,您不能使用具有地理坐标的 Voronoi 多边形。
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
# Convert to a suitable PCS (in this case OSGB)
stop <- st_as_sf(stop, coords=c('lng','lat'), crs=4326)
stop <- st_transform(stop, crs=27700)
loc <- st_as_sf(loc, coords=c('lng','lat'), crs=4326)
loc <- st_transform(loc, crs=27700)
# Extract Voronoi polygons around locations and convert to an sf object
loc_voronoi <- st_collection_extract(st_voronoi(do.call(c, st_geometry(loc))))
loc_voronoi <- st_sf(loc_voronoi, crs=crs(loc))
# Match Voronoi polygons to locations and select that geometry
loc$voronoi <- loc_voronoi$loc_voronoi[unlist(st_intersects(loc, loc_voronoi))]
st_geometry(loc) <- 'voronoi'
# Find which stop is closest to each location
stop$loc <- loc$station[unlist(st_intersects(stop, loc))]
# Reset locs to use the point geometry and get distances
st_geometry(loc) <- 'geometry'
stop$loc_dist <- st_distance(stop, loc[stop$loc,], by_element=TRUE)
这将为您提供以下输出:
Simple feature collection with 4 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 524069.7 ymin: 178326.3 xmax: 532074.6 ymax: 183213.9
epsg (SRID): 27700
proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
station postcode geometry loc loc_dist
1 Angel EC1V POINT (531483.8 183213.9) Bank 2423.722 [m]
2 Barbican EC1A POINT (532074.6 181931.2) Bank 1026.289 [m]
3 Barons Court W14 POINT (524069.7 178326.3) Baker Street 5332.478 [m]
4 Bayswater W2 POINT (525867.7 180813.9) Baker Street 2390.377 [m]
我不确定我是否理解正确你的问题,但你可以先交叉连接loc
和stop
,然后添加一个带有距离的列。
library(dplyr)
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
# Create data.tables
loc_dt <- data.table::setDT(loc)
stop_dt <- data.table::setDT(stop)
# Cross join two data.tables
coordinates_dt <- optiRum::CJ.dt(loc_dt, stop_dt)
# Add column with distance in meters
coordinates_dt %>%
mutate(dist_m = spatialrisk::haversine(lat, lng, i.lat, i.lng))
#> station lat lng postcode i.station i.lat i.lng
#> 1: Baker Street 51.52224 -0.15708000 NW1 Angel 51.53253 -0.105790
#> 2: Bank 51.51340 -0.08905843 EC3V Angel 51.53253 -0.105790
#> 3: Baker Street 51.52224 -0.15708000 NW1 Barbican 51.52087 -0.097758
#> 4: Bank 51.51340 -0.08905843 EC3V Barbican 51.52087 -0.097758
#> 5: Baker Street 51.52224 -0.15708000 NW1 Barons Court 51.49028 -0.214340
#> 6: Bank 51.51340 -0.08905843 EC3V Barons Court 51.49028 -0.214340
#> 7: Baker Street 51.52224 -0.15708000 NW1 Bayswater 51.51224 -0.187569
#> 8: Bank 51.51340 -0.08905843 EC3V Bayswater 51.51224 -0.187569
#> i.postcode dist_m
#> 1: EC1V 3732.422
#> 2: EC1V 2423.989
#> 3: EC1A 4111.786
#> 4: EC1A 1026.091
#> 5: W14 5328.649
#> 6: W14 9054.998
#> 7: W2 2387.231
#> 8: W2 6825.897
由 reprex package (v1.0.0)
于 2021-04-07 创建
我想参考我原来的 data.frame 在 data.frame 中找到最近的 LAT_LON 的 ID。我已经通过将 data.frame 合并到一个唯一标识符上并根据 geosphere
中的 distHaverSine
函数计算距离来解决这个问题。现在,我想更进一步,加入没有唯一标识符的 data.frames 并找到最近的 LAT-LON 的 ID。
合并后我使用了以下代码:
v3 <-v2 %>% mutate(CTD = distHaversine(cbind(LON.x, LAT.x), cbind(LON.y, LAT.y)))
数据:
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
作为最终结果,我想要这样的东西:
df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker
Street','Bank'),
stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'),
dist = c('x','x','x','x','x','x','x','x'),
lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
)
感谢任何帮助。谢谢。
由于对象之间的距离很小,我们可以加快计算速度 通过使用坐标之间的欧几里得距离。因为我们不在身边 赤道,lng坐标被压扁了一点;我们可以进行比较 通过稍微缩放 lng 稍微好一点。
cor_stop <- stop[, c("lat", "lng")]
cor_stop$lng <- cor_stop$lng * sin(mean(cor_stop$lat, na.rm = TRUE)/180*pi)
cor_loc <- loc[, c("lat", "lng")]
cor_loc$lng <- cor_loc$lng * sin(mean(cor_loc$lat, na.rm = TRUE)/180*pi)
然后我们可以使用 FNN
包计算每个位置最近的停靠点,该包使用基于树的搜索来快速找到最近的 K 个邻居。这应该扩展到大数据集(我已经将其用于具有数百万条记录的数据集):
library(FNN)
matches <- knnx.index(cor_stop, cor_loc, k = 1)
matches
## [,1]
## [1,] 4
## [2,] 2
然后我们可以构建最终结果:
res <- loc
res$stop_station <- stop$station[matches[,1]]
res$stop_lat <- stop$lat[matches[,1]]
res$stop_lng <- stop$lng[matches[,1]]
res$stop_postcode <- stop$postcode[matches[,1]]
并计算实际距离:
library(geosphere)
res$dist <- distHaversine(res[, c("lng", "lat")], res[, c("stop_lng", "stop_lat")])
res
## station lat lng postcode stop_station stop_lat stop_lng
## 1 Baker Street 51.52224 -0.15708000 NW1 Bayswater 51.51224 -0.187569
## 2 Bank 51.51340 -0.08905843 EC3V Barbican 51.52087 -0.097758
## stop_postcode dist
## 1 W2 2387.231
## 2 EC1A 1026.091
如果你不确定经纬度最近的点也是经纬度最近点'as the bird flies',你可以用这个方法先select经纬度最近的K个点;然后计算这些点的距离,然后 selecting 最近的点。
所有连接、距离计算和绘图都可以使用可用的 R 包完成。
library(tidyverse)
library(sf)
#> Linking to GEOS 3.6.2, GDAL 2.2.3, PROJ 4.9.3
library(nngeo)
library(mapview)
## Original data
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
df <- data.frame(loc = c('Baker Street','Bank','Baker Street','Bank','Baker Street','Bank','Baker
Street','Bank'),
stop = c('Angel','Barbican','Barons Court','Bayswater','Angel','Barbican','Barons Court','Bayswater'),
dist = c('x','x','x','x','x','x','x','x'),
lat = c(51.53253,51.520865,51.490281,51.51224,51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569,-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2','EC1V','EC1A', 'W14', 'W2')
)
## Create sf objects from lat/lon points
loc_sf <- loc %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>%
st_set_crs(4326)
stop_sf <- stop %>% st_as_sf(coords = c('lng', 'lat'), remove = T) %>%
st_set_crs(4326)
# Use st_nearest_feature to cbind loc to stop by nearest points
joined_sf <- stop_sf %>%
cbind(
loc_sf[st_nearest_feature(stop_sf, loc_sf),])
## mutate to add column showing distance between geometries
joined_sf %>%
mutate(dist = st_distance(geometry, geometry.1, by_element = T))
#> Simple feature collection with 4 features and 5 fields
#> Active geometry column: geometry
#> geometry type: POINT
#> dimension: XY
#> bbox: xmin: -0.21434 ymin: 51.49028 xmax: -0.097758 ymax: 51.53253
#> epsg (SRID): 4326
#> proj4string: +proj=longlat +datum=WGS84 +no_defs
#> station postcode station.1 postcode.1 geometry
#> 1 Angel EC1V Bank EC3V POINT (-0.10579 51.53253)
#> 2 Barbican EC1A Bank EC3V POINT (-0.097758 51.52087)
#> 3 Barons Court W14 Baker Street NW1 POINT (-0.21434 51.49028)
#> 4 Bayswater W2 Baker Street NW1 POINT (-0.187569 51.51224)
#> geometry.1 dist
#> 1 POINT (-0.08905843 51.5134) 2424.102 [m]
#> 2 POINT (-0.08905843 51.5134) 1026.449 [m]
#> 3 POINT (-0.15708 51.52224) 5333.417 [m]
#> 4 POINT (-0.15708 51.52224) 2390.791 [m]
## Use nngeo and mapview to plot lines on a map
# NOT run for reprex, output image attached
#connected <- st_connect(stop_sf, loc_sf)
# mapview(connected) +
# mapview(loc_sf, color = 'red') +
# mapview(stop_sf, color = 'black')
由 reprex package (v0.3.0)
于 2020-01-21 创建如果您能够使用投影坐标系,则可以完全避免搜索最近的邻居。如果可以,那么您可以在每个位置周围廉价地构建 Voronoi polygons - 这些多边形定义了最接近每个输入点的区域。
然后您可以只使用 GIS 交叉点来查找哪些点位于哪些多边形中,然后计算已知的最近点对的距离。我认为这应该快得多。但是,您不能使用具有地理坐标的 Voronoi 多边形。
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
# Convert to a suitable PCS (in this case OSGB)
stop <- st_as_sf(stop, coords=c('lng','lat'), crs=4326)
stop <- st_transform(stop, crs=27700)
loc <- st_as_sf(loc, coords=c('lng','lat'), crs=4326)
loc <- st_transform(loc, crs=27700)
# Extract Voronoi polygons around locations and convert to an sf object
loc_voronoi <- st_collection_extract(st_voronoi(do.call(c, st_geometry(loc))))
loc_voronoi <- st_sf(loc_voronoi, crs=crs(loc))
# Match Voronoi polygons to locations and select that geometry
loc$voronoi <- loc_voronoi$loc_voronoi[unlist(st_intersects(loc, loc_voronoi))]
st_geometry(loc) <- 'voronoi'
# Find which stop is closest to each location
stop$loc <- loc$station[unlist(st_intersects(stop, loc))]
# Reset locs to use the point geometry and get distances
st_geometry(loc) <- 'geometry'
stop$loc_dist <- st_distance(stop, loc[stop$loc,], by_element=TRUE)
这将为您提供以下输出:
Simple feature collection with 4 features and 4 fields
geometry type: POINT
dimension: XY
bbox: xmin: 524069.7 ymin: 178326.3 xmax: 532074.6 ymax: 183213.9
epsg (SRID): 27700
proj4string: +proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +towgs84=446.448,-125.157,542.06,0.15,0.247,0.842,-20.489 +units=m +no_defs
station postcode geometry loc loc_dist
1 Angel EC1V POINT (531483.8 183213.9) Bank 2423.722 [m]
2 Barbican EC1A POINT (532074.6 181931.2) Bank 1026.289 [m]
3 Barons Court W14 POINT (524069.7 178326.3) Baker Street 5332.478 [m]
4 Bayswater W2 POINT (525867.7 180813.9) Baker Street 2390.377 [m]
我不确定我是否理解正确你的问题,但你可以先交叉连接loc
和stop
,然后添加一个带有距离的列。
library(dplyr)
loc <- data.frame(station = c('Baker Street','Bank'),
lat = c(51.522236,51.5134047),
lng = c(-0.157080, -0.08905843),
postcode = c('NW1','EC3V'))
stop <- data.frame(station = c('Angel','Barbican','Barons Court','Bayswater'),
lat = c(51.53253,51.520865,51.490281,51.51224),
lng = c(-0.10579,-0.097758,-0.214340,-0.187569),
postcode = c('EC1V','EC1A', 'W14', 'W2'))
# Create data.tables
loc_dt <- data.table::setDT(loc)
stop_dt <- data.table::setDT(stop)
# Cross join two data.tables
coordinates_dt <- optiRum::CJ.dt(loc_dt, stop_dt)
# Add column with distance in meters
coordinates_dt %>%
mutate(dist_m = spatialrisk::haversine(lat, lng, i.lat, i.lng))
#> station lat lng postcode i.station i.lat i.lng
#> 1: Baker Street 51.52224 -0.15708000 NW1 Angel 51.53253 -0.105790
#> 2: Bank 51.51340 -0.08905843 EC3V Angel 51.53253 -0.105790
#> 3: Baker Street 51.52224 -0.15708000 NW1 Barbican 51.52087 -0.097758
#> 4: Bank 51.51340 -0.08905843 EC3V Barbican 51.52087 -0.097758
#> 5: Baker Street 51.52224 -0.15708000 NW1 Barons Court 51.49028 -0.214340
#> 6: Bank 51.51340 -0.08905843 EC3V Barons Court 51.49028 -0.214340
#> 7: Baker Street 51.52224 -0.15708000 NW1 Bayswater 51.51224 -0.187569
#> 8: Bank 51.51340 -0.08905843 EC3V Bayswater 51.51224 -0.187569
#> i.postcode dist_m
#> 1: EC1V 3732.422
#> 2: EC1V 2423.989
#> 3: EC1A 4111.786
#> 4: EC1A 1026.091
#> 5: W14 5328.649
#> 6: W14 9054.998
#> 7: W2 2387.231
#> 8: W2 6825.897
由 reprex package (v1.0.0)
于 2021-04-07 创建