R - 遍历 ID 和邮政编码的 df 以查找下一个最近的商店 (lat/longitude)、return 商店 ID 列表和下一个最近的商店
R - Loop through df of IDs and Zipcodes to find next closest store (lat/longitude), return list of Store ID, and next closest Store
我已经使用 zipcode 包根据邮政编码获取了一堆商店的邮政编码的纬度和经度。
我希望找到一种循环遍历列表的方法,并针对 5000 家商店中的每家商店,根据 Long/Lat 找到下一个最近的商店。
我目前有这个数据框(为此 post 删除了值):
'data.frame': 1206 obs. of 6 variables:
$ zip : Factor w/ 1182 levels "86645","43225",..: 1 2 3 4 5 6 7 8 9 10 ...
$ ID : int
$ city : chr
$ state : chr
$ latitude : num
$ longitude: num
这是我能想到的一种解决方案:
library(data.table)
library(zipcode)
library(geosphere)
data(zipcode)
set.seed(151)
n <- 100
storeData <- data.table(storeID=sample(1:100000,n,replace = FALSE),zip=sample(zipcode$zip,n,replace = TRUE))
zipcode <- data.table(zipcode,key = "zip")
storeData <- zipcode[storeData,on="zip"][!is.na(latitude)|!is.na(longitude)]
storeData
storeData
# zip city state latitude longitude storeID
# 1: 22408 Fredericksburg VA 38.23602 -77.46111 47945
# 2: 44515 Youngstown OH 41.09901 -80.74545 86541
# 3: 48112 Belleville MI 42.23993 -83.15082 77807
# 4: 80154 Englewood CO 39.73875 -104.40835 53862
# 5: 73766 Pond Creek OK 36.66271 -97.83063 44166
# 6: 32321 Bristol FL 30.36007 -84.97668 61377
# 7: 49442 Muskegon MI 43.23262 -86.19550 45492
# 8: 04537 Boothbay ME 43.90781 -69.64608 82087
storeDistances <- distm(storeData[,.(longitude,latitude)],storeData[,.(longitude,latitude)])
colnames(storeDistances) <- rownames(storeDistances) <- storeData[,storeID]
getClosest <- function(number=1){
apply(storeDistances,1,function(x) (colnames(storeDistances)[which(x==sort(x)[number+1])]))
}
storeData[,firstClosest:=getClosest(1)]
storeData[,secondClosest:=getClosest(2)]
storeData[,thirdClosest:=getClosest(3)]
storeData
# zip city state latitude longitude storeID firstClosest secondClosest
# 1: 22408 Fredericksburg VA 38.23602 -77.46111 47945 70091 41024
# 2: 44515 Youngstown OH 41.09901 -80.74545 86541 10806 78898
# 3: 48112 Belleville MI 42.23993 -83.15082 77807 25906 94780
# 4: 80154 Englewood CO 39.73875 -104.40835 53862 22347 91392
# 5: 73766 Pond Creek OK 36.66271 -97.83063 44166 4816 90090
# 6: 32321 Bristol FL 30.36007 -84.97668 61377 8187 1937
# 7: 49442 Muskegon MI 43.23262 -86.19550 45492 95486 97241
# 8: 04537 Boothbay ME 43.90781 -69.64608 82087 46720 7013
#
# thirdClosest
# 1: 57562
# 2: 71232
# 3: 86541
# 4: 97986
# 5: 146
# 6: 8113
# 7: 6400
# 8: 10872
storeDistances
是每个商店之间的距离矩阵。 getClosest
函数获取最近的商店。
我已经使用 zipcode 包根据邮政编码获取了一堆商店的邮政编码的纬度和经度。
我希望找到一种循环遍历列表的方法,并针对 5000 家商店中的每家商店,根据 Long/Lat 找到下一个最近的商店。
我目前有这个数据框(为此 post 删除了值):
'data.frame': 1206 obs. of 6 variables:
$ zip : Factor w/ 1182 levels "86645","43225",..: 1 2 3 4 5 6 7 8 9 10 ...
$ ID : int
$ city : chr
$ state : chr
$ latitude : num
$ longitude: num
这是我能想到的一种解决方案:
library(data.table)
library(zipcode)
library(geosphere)
data(zipcode)
set.seed(151)
n <- 100
storeData <- data.table(storeID=sample(1:100000,n,replace = FALSE),zip=sample(zipcode$zip,n,replace = TRUE))
zipcode <- data.table(zipcode,key = "zip")
storeData <- zipcode[storeData,on="zip"][!is.na(latitude)|!is.na(longitude)]
storeData
storeData
# zip city state latitude longitude storeID
# 1: 22408 Fredericksburg VA 38.23602 -77.46111 47945
# 2: 44515 Youngstown OH 41.09901 -80.74545 86541
# 3: 48112 Belleville MI 42.23993 -83.15082 77807
# 4: 80154 Englewood CO 39.73875 -104.40835 53862
# 5: 73766 Pond Creek OK 36.66271 -97.83063 44166
# 6: 32321 Bristol FL 30.36007 -84.97668 61377
# 7: 49442 Muskegon MI 43.23262 -86.19550 45492
# 8: 04537 Boothbay ME 43.90781 -69.64608 82087
storeDistances <- distm(storeData[,.(longitude,latitude)],storeData[,.(longitude,latitude)])
colnames(storeDistances) <- rownames(storeDistances) <- storeData[,storeID]
getClosest <- function(number=1){
apply(storeDistances,1,function(x) (colnames(storeDistances)[which(x==sort(x)[number+1])]))
}
storeData[,firstClosest:=getClosest(1)]
storeData[,secondClosest:=getClosest(2)]
storeData[,thirdClosest:=getClosest(3)]
storeData
# zip city state latitude longitude storeID firstClosest secondClosest
# 1: 22408 Fredericksburg VA 38.23602 -77.46111 47945 70091 41024
# 2: 44515 Youngstown OH 41.09901 -80.74545 86541 10806 78898
# 3: 48112 Belleville MI 42.23993 -83.15082 77807 25906 94780
# 4: 80154 Englewood CO 39.73875 -104.40835 53862 22347 91392
# 5: 73766 Pond Creek OK 36.66271 -97.83063 44166 4816 90090
# 6: 32321 Bristol FL 30.36007 -84.97668 61377 8187 1937
# 7: 49442 Muskegon MI 43.23262 -86.19550 45492 95486 97241
# 8: 04537 Boothbay ME 43.90781 -69.64608 82087 46720 7013
#
# thirdClosest
# 1: 57562
# 2: 71232
# 3: 86541
# 4: 97986
# 5: 146
# 6: 8113
# 7: 6400
# 8: 10872
storeDistances
是每个商店之间的距离矩阵。 getClosest
函数获取最近的商店。