R中按组的空间点距离分析

Question

我有一个看起来像这样的数据集，但要大得多

   ###   ##Fake data for stack exdb <- data.frame(zone =
 c(1,1,1,2,2,2),   site = c("study", "collect", "collect", "study",
 "collect", "collect"),   x = c(53.307726, 53.310660, 53.307089,
 53.313831, 53.319087, 53.318792),   y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))

我需要运行 STUDY 站点和 COLLECT 站点之间的点分析以查看以米为单位的距离。问题是我有许多不同的区域或组，它们都是独立的（即到区域 1 中的点的距离与区域 2 中的点无关）。

为此我需要做两件事，

点分析，计算每个区域的一个研究点与多个收集点之间的距离（以米为单位），

然后编写一个 FOREACH 或 LOOP 函数来计算数据集中每个组的距离。

最佳输出看起来像

exdb <- data.frame(zone = c(1,1,1,2,2,2),
  site = c("study", "collect", "collect", "study", "collect", "collect"),
  x = c(53.307726, 53.310660, 53.307089, 53.313831, 53.319087, 53.318792),
  y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815),
  dist = c(0, 10.3, 30.4, 0, 12.5, 11.2))

每个区域中的研究站点始终为 0，因为它是与该站点的距离，并且到每个收集站点的距离仅计算为每个唯一区域中的研究站点。

非常感谢。

杀

Answer 1

也许是这样的？

假设 x 和 y 是纬度和经度，我们可以使用 haversine 函数在旋转 table 后得到以米为单位的距离，使两个点之间的距离为计算自（以米为单位）：

library(tidyverse)
library(pracma)
#> 
#> Attaching package: 'pracma'
#> The following object is masked from 'package:purrr':
#> 
#>     cross

data <- data.frame(zone = c(1, 1, 1, 2, 2, 2), site = c(
  "study", "collect", "collect", "study",
  "collect", "collect"
), x = c(
  53.307726, 53.310660, 53.307089,
  53.313831, 53.319087, 53.318792
), y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))

data %>%
  pivot_wider(names_from = site, values_from = c(x, y)) %>%
  unnest(y_collect, y_study, x_collect, x_study) %>%
  mutate(
    dist = list(x_study, y_study, x_collect, y_collect) %>% pmap_dbl(~haversine(c(..1, ..2), c(..3, ..4)) * 1000)
  )
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates

#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(y_collect, y_study, x_collect, x_study))`, with `mutate()` if needed
#> # A tibble: 4 x 6
#>    zone x_study x_collect y_study y_collect  dist
#>   <dbl>   <dbl>     <dbl>   <dbl>     <dbl> <dbl>
#> 1     1    53.3      53.3   -6.22     -6.22  472.
#> 2     1    53.3      53.3   -6.22     -6.22  484.
#> 3     2    53.3      53.3   -6.21     -6.22  659.
#> 4     2    53.3      53.3   -6.21     -6.22  563.

^{由 reprex package (v2.0.1)}

于 2021-09-13 创建

Answer 2

我仍在学习空间方面的知识，但这行得通吗？

library(sf)
library(tidyverse)

exdb %>%
  arrange(zone, desc(site)) %>% #ensure study is first
  st_as_sf(coords = c("x", "y"), crs = 4326) %>%
  group_by(zone) %>%
  mutate(
    study_coord = geometry[1],
    dist = st_distance(geometry, study_coord, by_element = T),
  )

Answer 3

我相信这应该可行..但我无法在所需的输出中重现您的距离。

library(data.table)
library(purrr) # Or tidyverse
library(geosphere)
# Make your data a data.table
setDT(mydata)
# Split to a list based on zone and site
L <- split(mydata, by = c("zone", "site"), flatten = FALSE)
# Loop over list
L <- lapply(L, function(zone) {
  #get reference point to take dustance from
  point.study <- c(zone$study$y,zone$study$x)
  zone$study$dist <- 0
  # Calculate distance
  zone$collect$dist <- unlist(purrr::pmap( list(a = zone$collect$y, 
                                                b = zone$collect$x ), 
                                           ~(geosphere::distGeo( point.study, c(..1, ..2)))))
  return(zone)
})
# Rowbind the results together
data.table::rbindlist(lapply(L, data.table::rbindlist))
#    zone    site        x         y     dist
# 1:    1   study 53.30773 -6.222291   0.0000
# 2:    1 collect 53.31066 -6.217151 473.2943
# 3:    1 collect 53.30709 -6.215080 485.8806
# 4:    2   study 53.31383 -6.214152   0.0000
# 5:    2 collect 53.31909 -6.218723 659.5238
# 6:    2 collect 53.31879 -6.215815 563.1349

Answer 4

简单的 Base R 版本，不需要其他包。

从 exdb 开始，如上。

首先添加一个名为 dist 且值为 "study" 的新列，因为计划是在 zone 和 site=="study" 上自行合并：

> exdb$dist = "study"

自合并，只保留坐标列：

> MM = merge(exdb, exdb,
    by.x=c("zone","site"),
    by.y=c("zone","dist"))[,c("x.x","y.x","x.y","y.y")]

使用distGeo覆盖dist列。保持整洁：

> exdb$dist = distGeo(MM[,2:1],MM[,4:3])
> exdb
  zone    site        x         y     dist
1    1   study 53.30773 -6.222291   0.0000
2    1 collect 53.31066 -6.217151 473.2943
3    1 collect 53.30709 -6.215080 485.8806
4    2   study 53.31383 -6.214152   0.0000
5    2 collect 53.31909 -6.218723 659.5238
6    2 collect 53.31879 -6.215815 563.1349

Returns 与@wimpel 相同的答案，但没有额外的依赖项并且代码行数更少。

R中按组的空间点距离分析

Spatial point distance analysis by group in R

foreach

loops

r

distance

sf