如何在空间上稀疏 data.frames 物种出现数据列表?

How do I spatially rarify a list of data.frames of species occurence data?

我有一个代表物种运动的数据框列表(按个人和月份划分):

head(TD_track_group)     
<list_of<
      tbl_df<
        x_    : double
        y_    : double
        t_    : datetime<UTC>
        ind.id: factor<26bd3>
        m_    : integer
      >
    >[6]>
    [[1]]
    # A tibble: 412 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  19.4  13.2 2015-01-01 09:40:23 BAV7       1
     2  19.1  13.2 2015-01-01 10:40:06 BAV7       1
     3  18.8  13.0 2015-01-01 11:40:06 BAV7       1
     4  18.5  13.0 2015-01-01 12:40:06 BAV7       1
     5  18.3  13.0 2015-01-01 13:30:06 BAV7       1
     6  18.0  12.8 2015-01-01 14:30:06 BAV7       1
     7  18.0  12.8 2015-01-01 15:30:07 BAV7       1
     8  18.0  12.8 2015-01-02 09:40:23 BAV7       1
     9  18.0  12.8 2015-01-02 10:40:06 BAV7       1
    10  18.0  12.8 2015-01-02 11:40:06 BAV7       1
    # ... with 402 more rows
    
    [[2]]
    # A tibble: 392 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  17.0  12.2 2015-02-01 05:20:08 BAV7       2
     2  17.0  12.2 2015-02-01 05:30:07 BAV7       2
     3  17.0  12.2 2015-02-01 06:30:06 BAV7       2
     4  17.0  12.2 2015-02-01 07:30:06 BAV7       2
     5  17.0  12.2 2015-02-01 08:30:06 BAV7       2
     6  16.9  12.2 2015-02-01 09:30:06 BAV7       2
     7  16.8  12.3 2015-02-01 10:30:06 BAV7       2
     8  16.8  12.4 2015-02-01 11:30:06 BAV7       2
     9  16.8  12.5 2015-02-01 12:30:07 BAV7       2
    10  16.8  12.5 2015-02-01 13:30:08 BAV7       2
    # ... with 382 more rows
    
    [[3]]
    # A tibble: 14 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  17.1  12.5 2015-03-01 05:10:07 BAV7       3
     2  17.1  12.5 2015-03-01 05:30:07 BAV7       3
     3  17.1  12.6 2015-03-01 06:30:06 BAV7       3
     4  17.1  12.5 2015-03-01 07:30:06 BAV7       3
     5  17.1  12.6 2015-03-01 08:30:06 BAV7       3
     6  17.1  12.6 2015-03-01 09:30:07 BAV7       3
     7  17.1  12.5 2015-03-01 10:30:06 BAV7       3
     8  17.2  12.6 2015-03-01 11:30:06 BAV7       3
     9  17.3  12.7 2015-03-01 12:30:06 BAV7       3
    10  17.3  12.8 2015-03-01 13:30:07 BAV7       3
    11  17.3  12.8 2015-03-01 14:30:06 BAV7       3
    12  17.3  12.8 2015-03-01 15:30:07 BAV7       3
    13  17.3  12.8 2015-03-01 16:30:07 BAV7       3
    14  17.1  12.5 2015-03-01 02:00:23 BAV7       3
    
    [[4]]
    # A tibble: 37 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  27.9  17.0 2014-09-28 07:55:07 BAV7       9
     2  28.0  16.9 2014-09-28 08:30:06 BAV7       9
     3  28.1  16.7 2014-09-28 09:35:07 BAV7       9
     4  28.0  16.5 2014-09-28 10:30:06 BAV7       9
     5  27.8  16.3 2014-09-28 11:30:07 BAV7       9
     6  27.6  16.1 2014-09-28 12:30:07 BAV7       9
     7  27.3  15.8 2014-09-28 13:30:08 BAV7       9
     8  26.9  15.5 2014-09-28 14:30:06 BAV7       9
     9  26.9  15.4 2014-09-28 15:30:07 BAV7       9
    10  26.9  15.4 2014-09-29 04:05:07 BAV7       9
    # ... with 27 more rows
    
    [[5]]
    # A tibble: 434 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  23.9  14.8 2014-10-01 04:15:07 BAV7      10
     2  23.9  14.8 2014-10-01 04:30:06 BAV7      10
     3  23.9  14.8 2014-10-01 05:30:07 BAV7      10
     4  23.9  14.8 2014-10-01 06:30:06 BAV7      10
     5  23.9  14.8 2014-10-01 07:30:06 BAV7      10
     6  23.9  14.8 2014-10-01 08:30:07 BAV7      10
     7  23.9  14.8 2014-10-01 09:30:07 BAV7      10
     8  23.8  14.7 2014-10-01 10:30:06 BAV7      10
     9  23.9  14.6 2014-10-01 11:30:07 BAV7      10
    10  23.9  14.4 2014-10-01 12:30:07 BAV7      10
    # ... with 424 more rows
    
    [[6]]
    # A tibble: 420 x 5
          x_    y_ t_                  ind.id    m_
       <dbl> <dbl> <dttm>              <fct>  <int>
     1  25.7  13.2 2014-11-01 04:15:07 BAV7      11
     2  25.7  13.2 2014-11-01 04:30:06 BAV7      11
     3  25.7  13.2 2014-11-01 05:30:07 BAV7      11
     4  25.7  13.3 2014-11-01 06:30:06 BAV7      11
     5  25.7  13.2 2014-11-01 07:30:07 BAV7      11
     6  25.7  13.2 2014-11-01 08:30:07 BAV7      11
     7  25.7  13.3 2014-11-01 09:30:08 BAV7      11
     8  25.6  13.3 2014-11-01 10:30:09 BAV7      11
     9  25.7  13.2 2014-11-01 11:30:07 BAV7      11
    10  25.7  13.3 2014-11-01 12:30:06 BAV7      11
    # ... with 410 more rows

如何根据每个出现点之间的特定最小距离(例如 1 公里)在空间上稀疏每个组的物种位置数据?

如果我手动做的话,好像是这样的:

    TD_group1_df <- as.data.frame(TD_track_group[[1]])
    TD_group2_df <- as.data.frame(TD_track_group[[2]])
    TD_group3_df <- as.data.frame(TD_track_group[[3]])
    
#Creating SpatialPointsDataFrame and apply the function remove.near()
    TD_1_xy <- TD_group1_df[, 1:2]
    TD_1_data <- TD_group1_df[, 3:4]
    TD_1_sp <-
      SpatialPointsDataFrame(coords = TD_1_xy,
                             data = TD_1_data,
                             proj4string = crs)
    
    TD_1_th <- remove.near(TD_1_sp, dist = thin_distance)
    
    TD_2_xy <- TD_group2_df[, 1:2]
    TD_2_data <- TD_group2_df[, 3:4]
    TD_2_sp <-
      SpatialPointsDataFrame(coords = TD_2_xy,
                             data = TD_2_data,
                             proj4string = crs)
    
    
    TD_2_th <- remove.near(TD_2_sp, dist = thin_distance)
    
    TD_3_xy <- TD_group3_df[, 1:2]
    TD_3_data <- TD_group3_df[, 3:4]
    TD_3_sp <-
      SpatialPointsDataFrame(coords = TD_3_xy,
                             data = TD_3_data,
                             proj4string = crs)
    TD_3_th <- remove.near(TD_3_sp, dist = thin_distance)
TD_thinned <-
  rbind(
    TD_1_th,
    TD_2_th,
    TD_3_th)

但这对于列表中的 >100 data.frames 没有意义。有什么方法可以迭代创建 SpatialPointsDataFrames 的过程,然后 remove.near() 一次遍历所有 data.frames?

编辑: 使用lapply,出现如下错误:

crs <- CRS("+init=epsg:4329")

thin_distance <- 1 #kilometres


xy_groups <- lapply(TD_track_group, "[", , c("x_", "y_"))

data_groups <- lapply(TD_track_group, "[", , c("t_", "ind.id", "m_"))

SPDF_groups <-
  lapply(TD_track_group,
         SpatialPointsDataFrame,
         coords = xy_groups,
         data = data_groups,
         proj4string = crs)

Error in h(simpleError(msg, call)) : 
  error in evaluating the argument 'obj' in selecting a method for function 'coordinates': arguments imply differing number of rows: 294, 280, 10, 26, 310, 300, 267, 9, 90, 91, 46, 153, 231, 237, 247, 248, 86, 3, 25, 228, 224, 245, 252, 108, 222, 226, 219, 216, 150, 175, 151, 12, 149, 147, 48, 141, 119, 95, 7, 18, 22, 20, 23, 28, 2, 21, 24, 16, 11, 29, 15, 218, 125, 50, 176, 54, 210, 197, 238, 202, 235, 143, 81, 195, 63, 158, 33, 159, 192, 133, 199, 127, 180, 83, 5, 78, 17, 60, 157, 196, 303, 188, 174, 14, 99, 164, 268, 250, 223, 135, 217, 266, 265, 74, 43, 13, 155, 156, 112, 105, 233, 77

SpatThin_g <- lapply(SPDF_groups, remove.near, dist = thin_distance)

你已经有了自己的答案。关键是对组的迭代。我会建议只为一组创建一个函数,然后应用到整个组。

例如:

# customized function for one group
myfunction <- function(x, 
                       coord_cols = c(1, 2), 
                       data_cols = c(3, 4), 
                       crs = CRS("+init=epsg:4329"), 
                        thin_distance = 1){
   x <- as.data.frame(x)
   TD_1_sp <-
    SpatialPointsDataFrame(coords = x[, coord_cols],
      data = x[, data_cols],
      proj4string = crs)
        
  remove.near(TD_1_sp, dist = thin_distance)
}

# test whit one group
myfunction(TD_track_group[[1]])

#Apply to all groups
lapply(TD_track_group, myfunction)

有关详细信息,我会建议学习如何做:

  • Loops/iteration,
  • Controls/conditionals,以及
  • 函数

最佳