根据距离聚合 sf 点
aggregate sf points based on distance
当点在指定距离内时,我想创建 SpatialPointsDataFrame 的所有变量的平均值。我有一种方法可以做到这一点,但这似乎是解决问题的一种愚蠢方法。任何使用 tidy 品种的现代语法来做到这一点的想法都将不胜感激。
首先,我有一个 SpatialPointsDataFrame
,每个点都测量了几个变量。我想获得指定距离内点的所有变量的平均值。例如,从彼此相距 100 米以内的点的 meuse
数据中获取平均镉值:
library(sf)
library(sp)
data(meuse)
pts <- st_as_sf(meuse, coords = c("x", "y"),remove=FALSE)
pts100 <- st_is_within_distance(pts, dist = 100)
# can use sapply to get mean of a variable. E.g., cadmium
sapply(pts100, function(x){ mean(pts$cadmium[x]) })
所以,我想出了如何使用 sapply
逐个变量地执行此操作。因此,如果我愿意,我可以计算每个变量的平均值,为每个点生成质心,然后生成 SpatialPointsDataFrame
个唯一值。例如,对于前几个变量:
res <- data.frame(id=1:length(pts100),
x=NA, y=NA,
cadmium=NA, copper=NA, lead=NA)
res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
res2 <- res[duplicated(res$cadmium),]
coordinates(res2) <- c("x","y")
bubble(res2,"cadmium")
这可行,但看起来很麻烦,而且必须有更有效的方法。
sf
包似乎有一个聚合函数,它有一个连接参数,您可以在其中指定连接类型。
ibrary(sf)
library(sp)
data(meuse)
pts <- st_as_sf(meuse, coords = c("x", "y"),remove=FALSE)
# This will give lots of warnings since there are non-numeric columns
pts_agg <- aggregate(pts,
pts,
FUN = mean,
join = function(x, y) st_is_within_distance(x, y, dist = 100))
head(pts_agg)
Simple feature collection with 6 features and 14 fields
geometry type: POINT
dimension: XY
bbox: xmin: 181025 ymin: 333260 xmax: 181390 ymax: 333611
CRS: NA
x y cadmium copper lead zinc elev dist om ffreq soil lime landuse dist.m
1 181048.5 333584.5 10.15 83 288 1081.5 7.446 0.006791165 13.8 NA NA NA NA 40
2 181048.5 333584.5 10.15 83 288 1081.5 7.446 0.006791165 13.8 NA NA NA NA 40
3 181165.0 333537.0 6.50 68 199 640.0 7.800 0.103029000 13.0 NA NA NA NA 150
4 181298.0 333484.0 2.60 81 116 257.0 7.655 0.190094000 8.0 NA NA NA NA 270
5 181307.0 333330.0 2.80 48 117 269.0 7.480 0.277090000 8.7 NA NA NA NA 380
6 181390.0 333260.0 3.00 61 137 281.0 7.791 0.364067000 7.8 NA NA NA NA 470
geometry
1 POINT (181072 333611)
2 POINT (181025 333558)
3 POINT (181165 333537)
4 POINT (181298 333484)
5 POINT (181307 333330)
6 POINT (181390 333260)
抽查 pts 第 9 行,因为它在 pts100 中有一些匹配项:
> pts[pts100[[9]], 'cadmium'] %>% st_drop_geometry %>% summarise(mean = mean(cadmium))
mean
1 2.25
> pts_agg[9,'cadmium']
Simple feature collection with 1 feature and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 181060 ymin: 333231 xmax: 181060 ymax: 333231
CRS: NA
cadmium geometry
9 2.25 POINT (181060 333231)
当点在指定距离内时,我想创建 SpatialPointsDataFrame 的所有变量的平均值。我有一种方法可以做到这一点,但这似乎是解决问题的一种愚蠢方法。任何使用 tidy 品种的现代语法来做到这一点的想法都将不胜感激。
首先,我有一个 SpatialPointsDataFrame
,每个点都测量了几个变量。我想获得指定距离内点的所有变量的平均值。例如,从彼此相距 100 米以内的点的 meuse
数据中获取平均镉值:
library(sf)
library(sp)
data(meuse)
pts <- st_as_sf(meuse, coords = c("x", "y"),remove=FALSE)
pts100 <- st_is_within_distance(pts, dist = 100)
# can use sapply to get mean of a variable. E.g., cadmium
sapply(pts100, function(x){ mean(pts$cadmium[x]) })
所以,我想出了如何使用 sapply
逐个变量地执行此操作。因此,如果我愿意,我可以计算每个变量的平均值,为每个点生成质心,然后生成 SpatialPointsDataFrame
个唯一值。例如,对于前几个变量:
res <- data.frame(id=1:length(pts100),
x=NA, y=NA,
cadmium=NA, copper=NA, lead=NA)
res$x <- sapply(pts100, function(p){ mean(pts$x[p]) })
res$y <- sapply(pts100, function(p){ mean(pts$y[p]) })
res$cadmium <- sapply(pts100, function(p){ mean(pts$cadmium[p]) })
res$copper <- sapply(pts100, function(p){ mean(pts$copper[p]) })
res$lead <- sapply(pts100, function(p){ mean(pts$lead[p]) })
res2 <- res[duplicated(res$cadmium),]
coordinates(res2) <- c("x","y")
bubble(res2,"cadmium")
这可行,但看起来很麻烦,而且必须有更有效的方法。
sf
包似乎有一个聚合函数,它有一个连接参数,您可以在其中指定连接类型。
ibrary(sf)
library(sp)
data(meuse)
pts <- st_as_sf(meuse, coords = c("x", "y"),remove=FALSE)
# This will give lots of warnings since there are non-numeric columns
pts_agg <- aggregate(pts,
pts,
FUN = mean,
join = function(x, y) st_is_within_distance(x, y, dist = 100))
head(pts_agg)
Simple feature collection with 6 features and 14 fields
geometry type: POINT
dimension: XY
bbox: xmin: 181025 ymin: 333260 xmax: 181390 ymax: 333611
CRS: NA
x y cadmium copper lead zinc elev dist om ffreq soil lime landuse dist.m
1 181048.5 333584.5 10.15 83 288 1081.5 7.446 0.006791165 13.8 NA NA NA NA 40
2 181048.5 333584.5 10.15 83 288 1081.5 7.446 0.006791165 13.8 NA NA NA NA 40
3 181165.0 333537.0 6.50 68 199 640.0 7.800 0.103029000 13.0 NA NA NA NA 150
4 181298.0 333484.0 2.60 81 116 257.0 7.655 0.190094000 8.0 NA NA NA NA 270
5 181307.0 333330.0 2.80 48 117 269.0 7.480 0.277090000 8.7 NA NA NA NA 380
6 181390.0 333260.0 3.00 61 137 281.0 7.791 0.364067000 7.8 NA NA NA NA 470
geometry
1 POINT (181072 333611)
2 POINT (181025 333558)
3 POINT (181165 333537)
4 POINT (181298 333484)
5 POINT (181307 333330)
6 POINT (181390 333260)
抽查 pts 第 9 行,因为它在 pts100 中有一些匹配项:
> pts[pts100[[9]], 'cadmium'] %>% st_drop_geometry %>% summarise(mean = mean(cadmium))
mean
1 2.25
> pts_agg[9,'cadmium']
Simple feature collection with 1 feature and 1 field
geometry type: POINT
dimension: XY
bbox: xmin: 181060 ymin: 333231 xmax: 181060 ymax: 333231
CRS: NA
cadmium geometry
9 2.25 POINT (181060 333231)