将地址映射到最近的大都市区
Mapping address to nearest metropolitan area
我有一个数据集,我正在尝试获取一组位置到它最近的大都市。我有数据集 1 (df1),其中包含具有经度和纬度的地址位置。我想将这些地址映射到 50 英里半径内的所有最近的大都市(在数据框 df2 中)。
g_lat <- c(45.52306, 40.26719, 34.05223, 37.38605, 37.77493)
g_lon <- c(-122.67648,-86.13490, -118.24368, -122.08385, -122.41942)
address <- c(1,2,3,4,5)
df1 <- data.frame(g_lat, g_lon, address)
g_lat <- c(+37.7737185, +45.5222208,+37.77493)
g_lon <- c(-122.2744317,-098.7041549,-122.41942)
msa <- c(1,2,3)
df2 <- data.frame(g_lat, g_lon, msa)
我希望输出如下,显示与此地址关联的所有 msa:
address g_lat g_lon msa
5 37.77493 -122.41942 1
5 37.77493 -122.41942 3
请告诉我如何实现。我尝试了以下方法:
library(geosphere)
# create distance matrix
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
error:
Error in .pointsToMatrix(y) : longitude < -360
# assign the name to the point in list1 based on shortest distance in the matrix
df1$locality <- df2$locality[max.col(-mat)]
可能的解决方案:
library(geosphere)
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
ri <- row(mat)[mat < 80000]
ci <- col(mat)[mat < 80000]
df3 <- df1[ri,]
df3$msa <- df2[ci, "msa"]
给出:
> df3
g_lat g_lon address msa
4 37.38605 -122.0838 4 1
5 37.77493 -122.4194 5 1
4.1 37.38605 -122.0838 4 3
5.1 37.77493 -122.4194 5 3
使用 data.table or dplyr:
library(data.table)
setDT(df1)[ri][, msa := df2[ci, "msa"]][]
library(dplyr)
df1 %>%
slice(ri) %>%
mutate(msa = df2[ci, "msa"])
您可以添加距离:
df3$dist <- mat[cbind(ri, ci)]
给出:
> df3
g_lat g_lon address msa dist
4 37.38605 -122.0838 4 1 46202.74
5 37.77493 -122.4194 5 1 12774.31
4.1 37.38605 -122.0838 4 3 52359.08
5.1 37.77493 -122.4194 5 3 0.00
我有一个数据集,我正在尝试获取一组位置到它最近的大都市。我有数据集 1 (df1),其中包含具有经度和纬度的地址位置。我想将这些地址映射到 50 英里半径内的所有最近的大都市(在数据框 df2 中)。
g_lat <- c(45.52306, 40.26719, 34.05223, 37.38605, 37.77493)
g_lon <- c(-122.67648,-86.13490, -118.24368, -122.08385, -122.41942)
address <- c(1,2,3,4,5)
df1 <- data.frame(g_lat, g_lon, address)
g_lat <- c(+37.7737185, +45.5222208,+37.77493)
g_lon <- c(-122.2744317,-098.7041549,-122.41942)
msa <- c(1,2,3)
df2 <- data.frame(g_lat, g_lon, msa)
我希望输出如下,显示与此地址关联的所有 msa:
address g_lat g_lon msa
5 37.77493 -122.41942 1
5 37.77493 -122.41942 3
请告诉我如何实现。我尝试了以下方法:
library(geosphere)
# create distance matrix
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
error:
Error in .pointsToMatrix(y) : longitude < -360
# assign the name to the point in list1 based on shortest distance in the matrix
df1$locality <- df2$locality[max.col(-mat)]
可能的解决方案:
library(geosphere)
mat <- distm(df1[,c('g_lon','g_lat')], df2[,c('g_lon','g_lat')], fun=distVincentyEllipsoid)
ri <- row(mat)[mat < 80000]
ci <- col(mat)[mat < 80000]
df3 <- df1[ri,]
df3$msa <- df2[ci, "msa"]
给出:
> df3 g_lat g_lon address msa 4 37.38605 -122.0838 4 1 5 37.77493 -122.4194 5 1 4.1 37.38605 -122.0838 4 3 5.1 37.77493 -122.4194 5 3
使用 data.table or dplyr:
library(data.table)
setDT(df1)[ri][, msa := df2[ci, "msa"]][]
library(dplyr)
df1 %>%
slice(ri) %>%
mutate(msa = df2[ci, "msa"])
您可以添加距离:
df3$dist <- mat[cbind(ri, ci)]
给出:
> df3 g_lat g_lon address msa dist 4 37.38605 -122.0838 4 1 46202.74 5 37.77493 -122.4194 5 1 12774.31 4.1 37.38605 -122.0838 4 3 52359.08 5.1 37.77493 -122.4194 5 3 0.00