使用 geosphere 包有效地计算距离
Efficiently Calculate Distance using geosphere package
我有以下格式的数据(行数:~100 万)
head(dt)
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
1: -74.00394 40.74289 -73.99337 40.73425
2: -73.97386 40.75219 -73.95870 40.77253
3: -73.95441 40.76442 -73.97078 40.75835
4: -73.96234 40.76722 -73.97551 40.75687
5: -74.00466 40.70743 -73.99937 40.72152
6: -73.99557 40.71602 -73.99997 40.74332
library(geosphere)
dt = data.table(pickup_longitude = c(-74.00394, -73.97386, -73.95441, -73.96234, -74.00466, -73.99557),
pickup_latitude = c(40.74289, 40.75219, 40.76442, 40.76722, 40.70743, 40.71602),
dropoff_longitude = c(-73.99337, -73.95870, -73.97078, -73.97551, -73.99937, -73.99997),
dropoff_latitude = c(40.73425, 40.77253, 40.75835, 40.75687, 40.72152, 40.74332))
dt[, distance := apply(dt, 1, function(t) distm(x = c(t[1], t[2]), y = c(t[3], t[4])))]
我使用上面的代码使用 apply
作为 geosphere
包中的函数 distm
未矢量化。但是,上面代码中的 apply
花费了很多时间。
我也试过:
dt[, distance := distm(x = c(pickup_longitude, pickup_latitude), y = c(dropoff_longitude, dropoff_latitude)), by = 1:nrow(dt)]
还有什么方法可以更好更快地计算距离?
我试过了this。
dt[, distance := distHaversine(matrix(c(pickup_longitude, pickup_latitude), ncol = 2),
matrix(c(dropoff_longitude, dropoff_latitude), ncol = 2))]
这工作得很好。
我有以下格式的数据(行数:~100 万)
head(dt)
pickup_longitude pickup_latitude dropoff_longitude dropoff_latitude
1: -74.00394 40.74289 -73.99337 40.73425
2: -73.97386 40.75219 -73.95870 40.77253
3: -73.95441 40.76442 -73.97078 40.75835
4: -73.96234 40.76722 -73.97551 40.75687
5: -74.00466 40.70743 -73.99937 40.72152
6: -73.99557 40.71602 -73.99997 40.74332
library(geosphere)
dt = data.table(pickup_longitude = c(-74.00394, -73.97386, -73.95441, -73.96234, -74.00466, -73.99557),
pickup_latitude = c(40.74289, 40.75219, 40.76442, 40.76722, 40.70743, 40.71602),
dropoff_longitude = c(-73.99337, -73.95870, -73.97078, -73.97551, -73.99937, -73.99997),
dropoff_latitude = c(40.73425, 40.77253, 40.75835, 40.75687, 40.72152, 40.74332))
dt[, distance := apply(dt, 1, function(t) distm(x = c(t[1], t[2]), y = c(t[3], t[4])))]
我使用上面的代码使用 apply
作为 geosphere
包中的函数 distm
未矢量化。但是,上面代码中的 apply
花费了很多时间。
我也试过:
dt[, distance := distm(x = c(pickup_longitude, pickup_latitude), y = c(dropoff_longitude, dropoff_latitude)), by = 1:nrow(dt)]
还有什么方法可以更好更快地计算距离?
我试过了this。
dt[, distance := distHaversine(matrix(c(pickup_longitude, pickup_latitude), ncol = 2),
matrix(c(dropoff_longitude, dropoff_latitude), ncol = 2))]
这工作得很好。