地理编码:查找两组位置之间距离的有效方法
Geocoding: Efficient way to find the distance between two sets of locations
我有一组不同个人的位置坐标,还有另一组不同投递箱的坐标,用于他们的选票。我正在尝试找出他们的住所与最近的投递箱之间的距离。我附上了一份我必须处理的代码副本——它是从另一个堆栈溢出示例复制而来的。但是,它不是太有效,因为我正在处理的数据集有数百万行,并且代码依赖于找到所有可能的坐标组合,然后拉动最小的距离。有没有更有效的方法来处理这个问题?
我目前拥有的:
# Made-Up Data
library(geosphere)
library(tidyverse)
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26,78901))
# Code to find the distance between voters, and the dropoff boxes
# Order into a newdf as needed first.
# First, the voters:
voter_addresses <- data.frame(voter_id = as.character(geo_voters$voter_id),
lon_address = geo_voters$long,
lat_address = geo_voters$lat
)
# Second, the polling locations:
polling_address <- data.frame(place_number = 1:nrow(geo_dropoff_boxes),
lon_place = geo_dropoff_boxes$long,
lat_place = geo_dropoff_boxes$lat
)
# Create nested dfs:
voter_nest <- nest(voter_addresses, -voter_id, .key = 'voter_coords')
polling_nest <- nest(polling_address, -place_number, .key = 'polling_coords')
# Combine for combinations:
data_master <- crossing(voter_nest, polling_nest)
# Calculate shortest distance:
shortest_dist <- data_master %>%
mutate(dist = map2_dbl(voter_coords, polling_coords, distm)) %>%
group_by(voter_id) %>%
filter(dist == min(dist)) %>%
mutate(dist_km = dist/1000,
voter_id = as.character(voter_id)) %>%
select(voter_id, dist_km)
sf
包使这变得简单。 st_as_sf()
函数将经纬度值的数据框转换为地理参考点,st_distance()
函数计算它们之间的距离。当 运行 st_as_sf()
时,您需要指定一个坐标参考系。看起来您使用的是纬度和经度,所以我指定 crs="epsg:4326"
,这是最常见的 latitude/longitude 参考。
library( sf )
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26.78901))
# convert the data to sf features
geo_voters = st_as_sf( geo_voters, coords=c('long', 'lat'), crs="epsg:4326" )
geo_dropoff_boxes = st_as_sf( geo_dropoff_boxes, coords=c('long', 'lat'), crs="epsg:4326" )
# calculate the distances between voters and drop boxes
dist = st_distance( geo_voters, geo_dropoff_boxes )
print(dist)
现在每一行代表一个选民,每一列代表他们到投递箱的距离(以米为单位):
Units: [m]
[,1] [,2] [,3]
[1,] 5866.745 18821.87 2482400
[2,] 3461.945 17813.57 2483210
[3,] 20916.618 14641.09 2462186
我有一组不同个人的位置坐标,还有另一组不同投递箱的坐标,用于他们的选票。我正在尝试找出他们的住所与最近的投递箱之间的距离。我附上了一份我必须处理的代码副本——它是从另一个堆栈溢出示例复制而来的。但是,它不是太有效,因为我正在处理的数据集有数百万行,并且代码依赖于找到所有可能的坐标组合,然后拉动最小的距离。有没有更有效的方法来处理这个问题?
我目前拥有的:
# Made-Up Data
library(geosphere)
library(tidyverse)
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26,78901))
# Code to find the distance between voters, and the dropoff boxes
# Order into a newdf as needed first.
# First, the voters:
voter_addresses <- data.frame(voter_id = as.character(geo_voters$voter_id),
lon_address = geo_voters$long,
lat_address = geo_voters$lat
)
# Second, the polling locations:
polling_address <- data.frame(place_number = 1:nrow(geo_dropoff_boxes),
lon_place = geo_dropoff_boxes$long,
lat_place = geo_dropoff_boxes$lat
)
# Create nested dfs:
voter_nest <- nest(voter_addresses, -voter_id, .key = 'voter_coords')
polling_nest <- nest(polling_address, -place_number, .key = 'polling_coords')
# Combine for combinations:
data_master <- crossing(voter_nest, polling_nest)
# Calculate shortest distance:
shortest_dist <- data_master %>%
mutate(dist = map2_dbl(voter_coords, polling_coords, distm)) %>%
group_by(voter_id) %>%
filter(dist == min(dist)) %>%
mutate(dist_km = dist/1000,
voter_id = as.character(voter_id)) %>%
select(voter_id, dist_km)
sf
包使这变得简单。 st_as_sf()
函数将经纬度值的数据框转换为地理参考点,st_distance()
函数计算它们之间的距离。当 运行 st_as_sf()
时,您需要指定一个坐标参考系。看起来您使用的是纬度和经度,所以我指定 crs="epsg:4326"
,这是最常见的 latitude/longitude 参考。
library( sf )
geo_voters <- data.frame(voter_id = c(12345, 45678, 89011)
long=c(-43.17536, -43.17411, -43.36605),
lat=c(-22.95414, -22.9302, -23.00133))
geo_dropoff_boxes <- data.frame(long=c(-43.19155, -43.33636, -67.45666),
lat=c(-22.90353, -22.87253, -26.78901))
# convert the data to sf features
geo_voters = st_as_sf( geo_voters, coords=c('long', 'lat'), crs="epsg:4326" )
geo_dropoff_boxes = st_as_sf( geo_dropoff_boxes, coords=c('long', 'lat'), crs="epsg:4326" )
# calculate the distances between voters and drop boxes
dist = st_distance( geo_voters, geo_dropoff_boxes )
print(dist)
现在每一行代表一个选民,每一列代表他们到投递箱的距离(以米为单位):
Units: [m]
[,1] [,2] [,3]
[1,] 5866.745 18821.87 2482400
[2,] 3461.945 17813.57 2483210
[3,] 20916.618 14641.09 2462186