如何在空间上稀疏 data.frames 物种出现数据列表?
How do I spatially rarify a list of data.frames of species occurence data?
我有一个代表物种运动的数据框列表(按个人和月份划分):
head(TD_track_group)
<list_of<
tbl_df<
x_ : double
y_ : double
t_ : datetime<UTC>
ind.id: factor<26bd3>
m_ : integer
>
>[6]>
[[1]]
# A tibble: 412 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 19.4 13.2 2015-01-01 09:40:23 BAV7 1
2 19.1 13.2 2015-01-01 10:40:06 BAV7 1
3 18.8 13.0 2015-01-01 11:40:06 BAV7 1
4 18.5 13.0 2015-01-01 12:40:06 BAV7 1
5 18.3 13.0 2015-01-01 13:30:06 BAV7 1
6 18.0 12.8 2015-01-01 14:30:06 BAV7 1
7 18.0 12.8 2015-01-01 15:30:07 BAV7 1
8 18.0 12.8 2015-01-02 09:40:23 BAV7 1
9 18.0 12.8 2015-01-02 10:40:06 BAV7 1
10 18.0 12.8 2015-01-02 11:40:06 BAV7 1
# ... with 402 more rows
[[2]]
# A tibble: 392 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 17.0 12.2 2015-02-01 05:20:08 BAV7 2
2 17.0 12.2 2015-02-01 05:30:07 BAV7 2
3 17.0 12.2 2015-02-01 06:30:06 BAV7 2
4 17.0 12.2 2015-02-01 07:30:06 BAV7 2
5 17.0 12.2 2015-02-01 08:30:06 BAV7 2
6 16.9 12.2 2015-02-01 09:30:06 BAV7 2
7 16.8 12.3 2015-02-01 10:30:06 BAV7 2
8 16.8 12.4 2015-02-01 11:30:06 BAV7 2
9 16.8 12.5 2015-02-01 12:30:07 BAV7 2
10 16.8 12.5 2015-02-01 13:30:08 BAV7 2
# ... with 382 more rows
[[3]]
# A tibble: 14 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 17.1 12.5 2015-03-01 05:10:07 BAV7 3
2 17.1 12.5 2015-03-01 05:30:07 BAV7 3
3 17.1 12.6 2015-03-01 06:30:06 BAV7 3
4 17.1 12.5 2015-03-01 07:30:06 BAV7 3
5 17.1 12.6 2015-03-01 08:30:06 BAV7 3
6 17.1 12.6 2015-03-01 09:30:07 BAV7 3
7 17.1 12.5 2015-03-01 10:30:06 BAV7 3
8 17.2 12.6 2015-03-01 11:30:06 BAV7 3
9 17.3 12.7 2015-03-01 12:30:06 BAV7 3
10 17.3 12.8 2015-03-01 13:30:07 BAV7 3
11 17.3 12.8 2015-03-01 14:30:06 BAV7 3
12 17.3 12.8 2015-03-01 15:30:07 BAV7 3
13 17.3 12.8 2015-03-01 16:30:07 BAV7 3
14 17.1 12.5 2015-03-01 02:00:23 BAV7 3
[[4]]
# A tibble: 37 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 27.9 17.0 2014-09-28 07:55:07 BAV7 9
2 28.0 16.9 2014-09-28 08:30:06 BAV7 9
3 28.1 16.7 2014-09-28 09:35:07 BAV7 9
4 28.0 16.5 2014-09-28 10:30:06 BAV7 9
5 27.8 16.3 2014-09-28 11:30:07 BAV7 9
6 27.6 16.1 2014-09-28 12:30:07 BAV7 9
7 27.3 15.8 2014-09-28 13:30:08 BAV7 9
8 26.9 15.5 2014-09-28 14:30:06 BAV7 9
9 26.9 15.4 2014-09-28 15:30:07 BAV7 9
10 26.9 15.4 2014-09-29 04:05:07 BAV7 9
# ... with 27 more rows
[[5]]
# A tibble: 434 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 23.9 14.8 2014-10-01 04:15:07 BAV7 10
2 23.9 14.8 2014-10-01 04:30:06 BAV7 10
3 23.9 14.8 2014-10-01 05:30:07 BAV7 10
4 23.9 14.8 2014-10-01 06:30:06 BAV7 10
5 23.9 14.8 2014-10-01 07:30:06 BAV7 10
6 23.9 14.8 2014-10-01 08:30:07 BAV7 10
7 23.9 14.8 2014-10-01 09:30:07 BAV7 10
8 23.8 14.7 2014-10-01 10:30:06 BAV7 10
9 23.9 14.6 2014-10-01 11:30:07 BAV7 10
10 23.9 14.4 2014-10-01 12:30:07 BAV7 10
# ... with 424 more rows
[[6]]
# A tibble: 420 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 25.7 13.2 2014-11-01 04:15:07 BAV7 11
2 25.7 13.2 2014-11-01 04:30:06 BAV7 11
3 25.7 13.2 2014-11-01 05:30:07 BAV7 11
4 25.7 13.3 2014-11-01 06:30:06 BAV7 11
5 25.7 13.2 2014-11-01 07:30:07 BAV7 11
6 25.7 13.2 2014-11-01 08:30:07 BAV7 11
7 25.7 13.3 2014-11-01 09:30:08 BAV7 11
8 25.6 13.3 2014-11-01 10:30:09 BAV7 11
9 25.7 13.2 2014-11-01 11:30:07 BAV7 11
10 25.7 13.3 2014-11-01 12:30:06 BAV7 11
# ... with 410 more rows
如何根据每个出现点之间的特定最小距离(例如 1 公里)在空间上稀疏每个组的物种位置数据?
如果我手动做的话,好像是这样的:
TD_group1_df <- as.data.frame(TD_track_group[[1]])
TD_group2_df <- as.data.frame(TD_track_group[[2]])
TD_group3_df <- as.data.frame(TD_track_group[[3]])
#Creating SpatialPointsDataFrame and apply the function remove.near()
TD_1_xy <- TD_group1_df[, 1:2]
TD_1_data <- TD_group1_df[, 3:4]
TD_1_sp <-
SpatialPointsDataFrame(coords = TD_1_xy,
data = TD_1_data,
proj4string = crs)
TD_1_th <- remove.near(TD_1_sp, dist = thin_distance)
TD_2_xy <- TD_group2_df[, 1:2]
TD_2_data <- TD_group2_df[, 3:4]
TD_2_sp <-
SpatialPointsDataFrame(coords = TD_2_xy,
data = TD_2_data,
proj4string = crs)
TD_2_th <- remove.near(TD_2_sp, dist = thin_distance)
TD_3_xy <- TD_group3_df[, 1:2]
TD_3_data <- TD_group3_df[, 3:4]
TD_3_sp <-
SpatialPointsDataFrame(coords = TD_3_xy,
data = TD_3_data,
proj4string = crs)
TD_3_th <- remove.near(TD_3_sp, dist = thin_distance)
TD_thinned <-
rbind(
TD_1_th,
TD_2_th,
TD_3_th)
但这对于列表中的 >100 data.frames 没有意义。有什么方法可以迭代创建 SpatialPointsDataFrames 的过程,然后 remove.near() 一次遍历所有 data.frames?
编辑:
使用lapply,出现如下错误:
crs <- CRS("+init=epsg:4329")
thin_distance <- 1 #kilometres
xy_groups <- lapply(TD_track_group, "[", , c("x_", "y_"))
data_groups <- lapply(TD_track_group, "[", , c("t_", "ind.id", "m_"))
SPDF_groups <-
lapply(TD_track_group,
SpatialPointsDataFrame,
coords = xy_groups,
data = data_groups,
proj4string = crs)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'obj' in selecting a method for function 'coordinates': arguments imply differing number of rows: 294, 280, 10, 26, 310, 300, 267, 9, 90, 91, 46, 153, 231, 237, 247, 248, 86, 3, 25, 228, 224, 245, 252, 108, 222, 226, 219, 216, 150, 175, 151, 12, 149, 147, 48, 141, 119, 95, 7, 18, 22, 20, 23, 28, 2, 21, 24, 16, 11, 29, 15, 218, 125, 50, 176, 54, 210, 197, 238, 202, 235, 143, 81, 195, 63, 158, 33, 159, 192, 133, 199, 127, 180, 83, 5, 78, 17, 60, 157, 196, 303, 188, 174, 14, 99, 164, 268, 250, 223, 135, 217, 266, 265, 74, 43, 13, 155, 156, 112, 105, 233, 77
SpatThin_g <- lapply(SPDF_groups, remove.near, dist = thin_distance)
你已经有了自己的答案。关键是对组的迭代。我会建议只为一组创建一个函数,然后应用到整个组。
例如:
# customized function for one group
myfunction <- function(x,
coord_cols = c(1, 2),
data_cols = c(3, 4),
crs = CRS("+init=epsg:4329"),
thin_distance = 1){
x <- as.data.frame(x)
TD_1_sp <-
SpatialPointsDataFrame(coords = x[, coord_cols],
data = x[, data_cols],
proj4string = crs)
remove.near(TD_1_sp, dist = thin_distance)
}
# test whit one group
myfunction(TD_track_group[[1]])
#Apply to all groups
lapply(TD_track_group, myfunction)
有关详细信息,我会建议学习如何做:
- Loops/iteration,
- Controls/conditionals,以及
- 函数
最佳
我有一个代表物种运动的数据框列表(按个人和月份划分):
head(TD_track_group)
<list_of<
tbl_df<
x_ : double
y_ : double
t_ : datetime<UTC>
ind.id: factor<26bd3>
m_ : integer
>
>[6]>
[[1]]
# A tibble: 412 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 19.4 13.2 2015-01-01 09:40:23 BAV7 1
2 19.1 13.2 2015-01-01 10:40:06 BAV7 1
3 18.8 13.0 2015-01-01 11:40:06 BAV7 1
4 18.5 13.0 2015-01-01 12:40:06 BAV7 1
5 18.3 13.0 2015-01-01 13:30:06 BAV7 1
6 18.0 12.8 2015-01-01 14:30:06 BAV7 1
7 18.0 12.8 2015-01-01 15:30:07 BAV7 1
8 18.0 12.8 2015-01-02 09:40:23 BAV7 1
9 18.0 12.8 2015-01-02 10:40:06 BAV7 1
10 18.0 12.8 2015-01-02 11:40:06 BAV7 1
# ... with 402 more rows
[[2]]
# A tibble: 392 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 17.0 12.2 2015-02-01 05:20:08 BAV7 2
2 17.0 12.2 2015-02-01 05:30:07 BAV7 2
3 17.0 12.2 2015-02-01 06:30:06 BAV7 2
4 17.0 12.2 2015-02-01 07:30:06 BAV7 2
5 17.0 12.2 2015-02-01 08:30:06 BAV7 2
6 16.9 12.2 2015-02-01 09:30:06 BAV7 2
7 16.8 12.3 2015-02-01 10:30:06 BAV7 2
8 16.8 12.4 2015-02-01 11:30:06 BAV7 2
9 16.8 12.5 2015-02-01 12:30:07 BAV7 2
10 16.8 12.5 2015-02-01 13:30:08 BAV7 2
# ... with 382 more rows
[[3]]
# A tibble: 14 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 17.1 12.5 2015-03-01 05:10:07 BAV7 3
2 17.1 12.5 2015-03-01 05:30:07 BAV7 3
3 17.1 12.6 2015-03-01 06:30:06 BAV7 3
4 17.1 12.5 2015-03-01 07:30:06 BAV7 3
5 17.1 12.6 2015-03-01 08:30:06 BAV7 3
6 17.1 12.6 2015-03-01 09:30:07 BAV7 3
7 17.1 12.5 2015-03-01 10:30:06 BAV7 3
8 17.2 12.6 2015-03-01 11:30:06 BAV7 3
9 17.3 12.7 2015-03-01 12:30:06 BAV7 3
10 17.3 12.8 2015-03-01 13:30:07 BAV7 3
11 17.3 12.8 2015-03-01 14:30:06 BAV7 3
12 17.3 12.8 2015-03-01 15:30:07 BAV7 3
13 17.3 12.8 2015-03-01 16:30:07 BAV7 3
14 17.1 12.5 2015-03-01 02:00:23 BAV7 3
[[4]]
# A tibble: 37 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 27.9 17.0 2014-09-28 07:55:07 BAV7 9
2 28.0 16.9 2014-09-28 08:30:06 BAV7 9
3 28.1 16.7 2014-09-28 09:35:07 BAV7 9
4 28.0 16.5 2014-09-28 10:30:06 BAV7 9
5 27.8 16.3 2014-09-28 11:30:07 BAV7 9
6 27.6 16.1 2014-09-28 12:30:07 BAV7 9
7 27.3 15.8 2014-09-28 13:30:08 BAV7 9
8 26.9 15.5 2014-09-28 14:30:06 BAV7 9
9 26.9 15.4 2014-09-28 15:30:07 BAV7 9
10 26.9 15.4 2014-09-29 04:05:07 BAV7 9
# ... with 27 more rows
[[5]]
# A tibble: 434 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 23.9 14.8 2014-10-01 04:15:07 BAV7 10
2 23.9 14.8 2014-10-01 04:30:06 BAV7 10
3 23.9 14.8 2014-10-01 05:30:07 BAV7 10
4 23.9 14.8 2014-10-01 06:30:06 BAV7 10
5 23.9 14.8 2014-10-01 07:30:06 BAV7 10
6 23.9 14.8 2014-10-01 08:30:07 BAV7 10
7 23.9 14.8 2014-10-01 09:30:07 BAV7 10
8 23.8 14.7 2014-10-01 10:30:06 BAV7 10
9 23.9 14.6 2014-10-01 11:30:07 BAV7 10
10 23.9 14.4 2014-10-01 12:30:07 BAV7 10
# ... with 424 more rows
[[6]]
# A tibble: 420 x 5
x_ y_ t_ ind.id m_
<dbl> <dbl> <dttm> <fct> <int>
1 25.7 13.2 2014-11-01 04:15:07 BAV7 11
2 25.7 13.2 2014-11-01 04:30:06 BAV7 11
3 25.7 13.2 2014-11-01 05:30:07 BAV7 11
4 25.7 13.3 2014-11-01 06:30:06 BAV7 11
5 25.7 13.2 2014-11-01 07:30:07 BAV7 11
6 25.7 13.2 2014-11-01 08:30:07 BAV7 11
7 25.7 13.3 2014-11-01 09:30:08 BAV7 11
8 25.6 13.3 2014-11-01 10:30:09 BAV7 11
9 25.7 13.2 2014-11-01 11:30:07 BAV7 11
10 25.7 13.3 2014-11-01 12:30:06 BAV7 11
# ... with 410 more rows
如何根据每个出现点之间的特定最小距离(例如 1 公里)在空间上稀疏每个组的物种位置数据?
如果我手动做的话,好像是这样的:
TD_group1_df <- as.data.frame(TD_track_group[[1]])
TD_group2_df <- as.data.frame(TD_track_group[[2]])
TD_group3_df <- as.data.frame(TD_track_group[[3]])
#Creating SpatialPointsDataFrame and apply the function remove.near()
TD_1_xy <- TD_group1_df[, 1:2]
TD_1_data <- TD_group1_df[, 3:4]
TD_1_sp <-
SpatialPointsDataFrame(coords = TD_1_xy,
data = TD_1_data,
proj4string = crs)
TD_1_th <- remove.near(TD_1_sp, dist = thin_distance)
TD_2_xy <- TD_group2_df[, 1:2]
TD_2_data <- TD_group2_df[, 3:4]
TD_2_sp <-
SpatialPointsDataFrame(coords = TD_2_xy,
data = TD_2_data,
proj4string = crs)
TD_2_th <- remove.near(TD_2_sp, dist = thin_distance)
TD_3_xy <- TD_group3_df[, 1:2]
TD_3_data <- TD_group3_df[, 3:4]
TD_3_sp <-
SpatialPointsDataFrame(coords = TD_3_xy,
data = TD_3_data,
proj4string = crs)
TD_3_th <- remove.near(TD_3_sp, dist = thin_distance)
TD_thinned <-
rbind(
TD_1_th,
TD_2_th,
TD_3_th)
但这对于列表中的 >100 data.frames 没有意义。有什么方法可以迭代创建 SpatialPointsDataFrames 的过程,然后 remove.near() 一次遍历所有 data.frames?
编辑: 使用lapply,出现如下错误:
crs <- CRS("+init=epsg:4329")
thin_distance <- 1 #kilometres
xy_groups <- lapply(TD_track_group, "[", , c("x_", "y_"))
data_groups <- lapply(TD_track_group, "[", , c("t_", "ind.id", "m_"))
SPDF_groups <-
lapply(TD_track_group,
SpatialPointsDataFrame,
coords = xy_groups,
data = data_groups,
proj4string = crs)
Error in h(simpleError(msg, call)) :
error in evaluating the argument 'obj' in selecting a method for function 'coordinates': arguments imply differing number of rows: 294, 280, 10, 26, 310, 300, 267, 9, 90, 91, 46, 153, 231, 237, 247, 248, 86, 3, 25, 228, 224, 245, 252, 108, 222, 226, 219, 216, 150, 175, 151, 12, 149, 147, 48, 141, 119, 95, 7, 18, 22, 20, 23, 28, 2, 21, 24, 16, 11, 29, 15, 218, 125, 50, 176, 54, 210, 197, 238, 202, 235, 143, 81, 195, 63, 158, 33, 159, 192, 133, 199, 127, 180, 83, 5, 78, 17, 60, 157, 196, 303, 188, 174, 14, 99, 164, 268, 250, 223, 135, 217, 266, 265, 74, 43, 13, 155, 156, 112, 105, 233, 77
SpatThin_g <- lapply(SPDF_groups, remove.near, dist = thin_distance)
你已经有了自己的答案。关键是对组的迭代。我会建议只为一组创建一个函数,然后应用到整个组。
例如:
# customized function for one group
myfunction <- function(x,
coord_cols = c(1, 2),
data_cols = c(3, 4),
crs = CRS("+init=epsg:4329"),
thin_distance = 1){
x <- as.data.frame(x)
TD_1_sp <-
SpatialPointsDataFrame(coords = x[, coord_cols],
data = x[, data_cols],
proj4string = crs)
remove.near(TD_1_sp, dist = thin_distance)
}
# test whit one group
myfunction(TD_track_group[[1]])
#Apply to all groups
lapply(TD_track_group, myfunction)
有关详细信息,我会建议学习如何做:
- Loops/iteration,
- Controls/conditionals,以及
- 函数
最佳