R中按组的空间点距离分析

Spatial point distance analysis by group in R

我有一个看起来像这样的数据集,但要大得多

   ###   ##Fake data for stack exdb <- data.frame(zone =
 c(1,1,1,2,2,2),   site = c("study", "collect", "collect", "study",
 "collect", "collect"),   x = c(53.307726, 53.310660, 53.307089,
 53.313831, 53.319087, 53.318792),   y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))

我需要 运行 STUDY 站点和 COLLECT 站点之间的点分析以查看以米为单位的距离。问题是我有许多不同的区域或组,它们都是独立的(即到区域 1 中的点的距离与区域 2 中的点无关)。

为此我需要做两件事,

点分析,计算每个区域的一个研究点与多个收集点之间的距离(以米为单位),

然后编写一个 FOREACH 或 LOOP 函数来计算数据集中每个组的距离。

最佳输出看起来像

exdb <- data.frame(zone = c(1,1,1,2,2,2),
  site = c("study", "collect", "collect", "study", "collect", "collect"),
  x = c(53.307726, 53.310660, 53.307089, 53.313831, 53.319087, 53.318792),
  y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815),
  dist = c(0, 10.3, 30.4, 0, 12.5, 11.2))

每个区域中的研究站点始终为 0,因为它是与该站点的距离,并且到每个收集站点的距离仅计算为每个唯一区域中的研究站点。

非常感谢。

也许是这样的?

假设 x 和 y 是纬度和经度,我们可以使用 haversine 函数在旋转 table 后得到以米为单位的距离,使两个点之间的距离为计算自(以米为单位):

library(tidyverse)
library(pracma)
#> 
#> Attaching package: 'pracma'
#> The following object is masked from 'package:purrr':
#> 
#>     cross

data <- data.frame(zone = c(1, 1, 1, 2, 2, 2), site = c(
  "study", "collect", "collect", "study",
  "collect", "collect"
), x = c(
  53.307726, 53.310660, 53.307089,
  53.313831, 53.319087, 53.318792
), y = c(-6.222291, -6.217151, -6.215080, -6.214152, -6.218723, -6.215815))

data %>%
  pivot_wider(names_from = site, values_from = c(x, y)) %>%
  unnest(y_collect, y_study, x_collect, x_study) %>%
  mutate(
    dist = list(x_study, y_study, x_collect, y_collect) %>% pmap_dbl(~haversine(c(..1, ..2), c(..3, ..4)) * 1000)
  )
#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates

#> Warning: Values are not uniquely identified; output will contain list-cols.
#> * Use `values_fn = list` to suppress this warning.
#> * Use `values_fn = length` to identify where the duplicates arise
#> * Use `values_fn = {summary_fun}` to summarise duplicates
#> Warning: unnest() has a new interface. See ?unnest for details.
#> Try `df %>% unnest(c(y_collect, y_study, x_collect, x_study))`, with `mutate()` if needed
#> # A tibble: 4 x 6
#>    zone x_study x_collect y_study y_collect  dist
#>   <dbl>   <dbl>     <dbl>   <dbl>     <dbl> <dbl>
#> 1     1    53.3      53.3   -6.22     -6.22  472.
#> 2     1    53.3      53.3   -6.22     -6.22  484.
#> 3     2    53.3      53.3   -6.21     -6.22  659.
#> 4     2    53.3      53.3   -6.21     -6.22  563.

reprex package (v2.0.1)

于 2021-09-13 创建

我仍在学习空间方面的知识,但这行得通吗?

library(sf)
library(tidyverse)

exdb %>%
  arrange(zone, desc(site)) %>% #ensure study is first
  st_as_sf(coords = c("x", "y"), crs = 4326) %>%
  group_by(zone) %>%
  mutate(
    study_coord = geometry[1],
    dist = st_distance(geometry, study_coord, by_element = T),
  )

我相信这应该可行..但我无法在所需的输出中重现您的距离。

library(data.table)
library(purrr) # Or tidyverse
library(geosphere)
# Make your data a data.table
setDT(mydata)
# Split to a list based on zone and site
L <- split(mydata, by = c("zone", "site"), flatten = FALSE)
# Loop over list
L <- lapply(L, function(zone) {
  #get reference point to take dustance from
  point.study <- c(zone$study$y,zone$study$x)
  zone$study$dist <- 0
  # Calculate distance
  zone$collect$dist <- unlist(purrr::pmap( list(a = zone$collect$y, 
                                                b = zone$collect$x ), 
                                           ~(geosphere::distGeo( point.study, c(..1, ..2)))))
  return(zone)
})
# Rowbind the results together
data.table::rbindlist(lapply(L, data.table::rbindlist))
#    zone    site        x         y     dist
# 1:    1   study 53.30773 -6.222291   0.0000
# 2:    1 collect 53.31066 -6.217151 473.2943
# 3:    1 collect 53.30709 -6.215080 485.8806
# 4:    2   study 53.31383 -6.214152   0.0000
# 5:    2 collect 53.31909 -6.218723 659.5238
# 6:    2 collect 53.31879 -6.215815 563.1349

简单的 Base R 版本,不需要其他包。

exdb 开始,如上。

首先添加一个名为 dist 且值为 "study" 的新列,因为计划是在 zonesite=="study" 上自行合并:

> exdb$dist = "study"

自合并,只保留坐标列:

> MM = merge(exdb, exdb,
    by.x=c("zone","site"),
    by.y=c("zone","dist"))[,c("x.x","y.x","x.y","y.y")]

使用distGeo覆盖dist列。保持整洁:

> exdb$dist = distGeo(MM[,2:1],MM[,4:3])
> exdb
  zone    site        x         y     dist
1    1   study 53.30773 -6.222291   0.0000
2    1 collect 53.31066 -6.217151 473.2943
3    1 collect 53.30709 -6.215080 485.8806
4    2   study 53.31383 -6.214152   0.0000
5    2 collect 53.31909 -6.218723 659.5238
6    2 collect 53.31879 -6.215815 563.1349

Returns 与@wimpel 相同的答案,但没有额外的依赖项并且代码行数更少。