计算一天中同一时间多个点之间的距离

Calculating distance between multiple points at the same time of the day

我有两个数据框,一个是我的船 GPS 位置(5512 条记录),另一个是渔船位置(35381 条记录)。我想计算我的船与当天同一时间出现在该区域的所有其他渔船之间的距离(精确到分钟)。

我为所有位置创建了一个 IDdatecode (yyyymmddhhmm),然后我根据相同的 IDdatecode 合并了两个数据帧。我这样做了:

merged_table<- merge(myboat,fishboats,by="IDdatecode",all.y=TRUE)

为了计算距离,我使用了以下公式:

merged_table$distance_between_vessels=distm(c("lon1","lat1"),c("lon2","lat2"),fun=distGeo)

其中 lon1、lat1 是我的船位,lon2、lat2 是渔船。

但是我得到以下错误:

Error in `$<-.data.frame`(`*tmp*`, "distance_between_vessels", value = NA_real_) : 
  replacement has 1 row, data has 35652
In addition: Warning messages:
1: In .pointsToMatrix(x) : NAs introduced by coercion
2: In .pointsToMatrix(y) : NAs introduced by coercion

到目前为止我尝试的是:

  1. 使用其他公式:merged_table$distance_between_vessels=distGeo(c("lon1","lat1"),c("lon2","lat2"))
  2. 将纬度和经度的所有列放入“as.numeric”
  3. 仅使用我的船和渔船都在的间隔时间
  4. 忽略警告并继续

但我仍然只得到一个 NA 列表。

我在一个更简单的数据集(只有我的船位置)中使用了函数“distGeo”,我在其中手动计算了第一点和第二点之间的距离,然后是第二点和第三点之间的距离,依此类推。该函数完美运行,因为它为我提供了两点之间的准确距离(我在 ArcGIS 上检查过)。这就是我所做的:

distGeo(mydata[1, ], mydata[2, ])
distGeo(mydata[2, ], mydata[3, ])
distGeo(mydata[3, ], mydata[4, ])

所以,我想根据一天中的唯一时间计算 'one-to-many' 距离,但出现上述错误。关于为什么的任何想法?谢谢:)

在这里,我合并的前 10 行 table:

structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L, 
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L), .Label = c("d201805081203", 
"d201805081204", "d201805081205", "d201805081206", "d201805081207", 
"d201805081208"), class = "factor"), lon1 = c(12.40203333, 12.4071, 
12.41165, 12.41165, 12.41485, 12.41485, 12.41663333, 12.41663333, 
12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333, 
45.11218333, 45.11303333, 45.11303333, 45.11313333, 45.11313333, 
45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718, 
13.02585827, 13.02453654, 13.02173, 13.02321482, 13.02052301, 
13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946, 
44.99031749, 44.99117498, 44.98792, 44.99203246, 44.98868065, 
44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L, 
1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II", 
"MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode", 
"lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA, 
-10L), class = "data.frame")

V2,更新(2022 年 1 月 17 日)

很高兴它对你有用。如果您愿意避免 for 循环,您可以考虑使用 dplyr 方法。 Have a look.

  library(dplyr)
  
  df <- silvia %>%
    rowwise() %>% 
    mutate(distance = geosphere::distGeo(c(lon1, lat1), c(lon2, lat2)))
  df

base R **apply 系列是另一种选择。


V1(2022 年 1 月 16 日)

希望此方法对您有所帮助。通常建议不要使用 for 循环。但是,我用了一个,因为它们很容易理解。

我做了以下假设:

  • boat1 是你的船。 lat1lon1 表示 boat1 对任何 IDdatecode;
  • 的位置
  • 因为我没有完全理解你所说的“基于一天中的唯一时间”是什么意思,所以我假设循环遍历每一行就足够了;
  • 函数 distGeo() 来自 geosphere 包。
# loading your dataframe as "silvia"
silvia <- structure(list(Record = 1:10, IDdatecode = structure(c(1L, 2L, 3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L),
          .Label = c("d201805081203","d201805081204", "d201805081205", "d201805081206", "d201805081207", "d201805081208"),
          class = "factor"), lon1 = c(12.40203333, 12.4071, 12.41165, 12.41165, 12.41485, 12.41485, 12.41663333, 
          12.41663333, 12.41841667, 12.41841667), lat1 = c(45.1067, 45.10921667, 45.11218333, 45.11218333, 45.11303333, 
          45.11303333, 45.11313333, 45.11313333, 45.11348333, 45.11348333), boat1 = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
          1L, 1L, 1L), .Label = "RB", class = "factor"), lon2 = c(13.02718, 13.02585827, 13.02453654, 13.02173, 13.02321482,
          13.02052301, 13.02189309, 13.01931602, 13.02057136, 13.01810904), lat2 = c(44.98946, 44.99031749, 44.99117498, 44.98792,
          44.99203246, 44.98868065, 44.99288995, 44.98944129, 44.99374744, 44.99020194), boat2 = structure(c(1L, 1L, 1L, 2L,
          1L, 2L, 1L, 2L, 1L, 2L), .Label = c("IMPERO II", "MISTRAL"), class = "factor")), .Names = c("Record", "IDdatecode", 
          "lon1", "lat1", "boat1", "lon2", "lat2", "boat2"), row.names = c(NA, -10L), class = "data.frame")


# for EACH ROW in "silvia" calculate the distance between c("lon1", "lat1") and c("lon2", "lat2")
for (i in 1:nrow(silvia)){

  silvia$distance[i] <- geosphere::distGeo(c(silvia[i, "lon1"], silvia[i,"lat1"]), 
                                c(silvia[i, "lon2"], silvia[i,"lat2"])) 

}


# here you see the first 5 entrys of the df "silvia"
# the distances are calculated in metres 
# the parameters a and f are set to WGS84 by default.
head(silvia, n=5)
#>   Record    IDdatecode     lon1     lat1 boat1     lon2     lat2     boat2
#> 1      1 d201805081203 12.40203 45.10670    RB 13.02718 44.98946 IMPERO II
#> 2      2 d201805081204 12.40710 45.10922    RB 13.02586 44.99032 IMPERO II
#> 3      3 d201805081205 12.41165 45.11218    RB 13.02454 44.99117 IMPERO II
#> 4      4 d201805081205 12.41165 45.11218    RB 13.02173 44.98792   MISTRAL
#> 5      5 d201805081206 12.41485 45.11303    RB 13.02321 44.99203 IMPERO II
#>   distance
#> 1 50943.77
#> 2 50503.93
#> 3 50118.46
#> 4 50005.52
#> 5 49774.51

注意。由 reprex 包 (v2.0.1) 创建于 2022-01-16