数据框中坐标之间的距离顺序？

Question

我有一个带有 lat/lon 坐标的数据框，这些坐标基本上是 gps 信号。我需要计算连续行之间的距离，然后用于检查以确保它不超过我感兴趣的特定阈值。

这是一个示例数据集：

library(geosphere)
library(tidyverse)

Seqlat <- seq(from = -90, to = 90, by = .01)
Seqlong <- seq(from = -180, to = 180, by = .01)
Latitude <- sample(Seqlat, size = 100, replace = TRUE)
Longitude <- sample(Seqlong, size = 100, replace = TRUE)

df <- data.frame(Latitude, Longitude)

我知道我可以使用geosphere::distm()函数来计算坐标集之间的距离。如果我从数据框中单独提取它们，这将起作用：


distm(c(df$Longitude[1], df$Latitude[1]),
  c(df$Longitude[2], df$Latitude[2]),
  fun = distHaversine)

但是，当我尝试在数据框中执行此操作时，它不起作用。我试图从计算中排除最后一行，希望我能得到所有其他行的差异，但这没有用...

df %>% mutate(distance = ifelse(row_number() == n(), distm(
  c(Longitude, Latitude),
  c(lead(Longitude), lead(Latitude)),fun = distHaversine
), NA))

理想情况下，我想要的是新列中每对连续坐标之间的距离。最后一行不会有距离，因为没有后续行可以计算它。

Answer 1

df["distance"] <- c(NA,
                    sapply(seq.int(2,nrow(df)), function(i){
                      distm(c(df$Longitude[i-1],df$Latitude[i-1]),
                            c(df$Longitude[i], df$Latitude[i]),
                            fun = distHaversine)
                    })
)

这会为第一行生成一个以 NA 开头的向量。然后它迭代到最后一行，同时计算距离并将它们添加到向量中。

Answer 2

如果您稍微重组数据框，那么在 dplyr 管道中执行此操作会很容易。

library(dplyr)
library(geosphere)

df %>%
  mutate(across(.fns = lead, .names = '{col}_next')) %>%
  rowwise() %>%
  mutate(dist = distm(c(Longitude, Latitude),c(Longitude_next, Latitude_next),
                 fun = distHaversine)[1]) %>%
  ungroup()  %>%
  select(-ends_with('next'))

#   Latitude Longitude      dist
#      <dbl>     <dbl>     <dbl>
# 1    87.2      -24.6 11575192.
# 2   -14.7     -100.  15515546.
# 3    -9.31     113.  17566695.
# 4     3.44     -88.7  8298367.
# 5    77.4     -106.  12966075.
# 6   -32.2     -172.  10435334.
# 7   -29.4      -55.7  8368057.
# 8    36.4      -94.6 15108192.
# 9    -3.76     118.  11331809.
#10   -27.6     -137.  14668975.
# … with 90 more rows

我们创建了另外两个列 Longitude_next 和 Latitude_next，它们具有每行的下一个值，并在每行中应用 distm 函数。

数据框中坐标之间的距离顺序？

Distance between coordinates in dataframe sequentially?

r

spatial

dplyr

geosphere