一个或多个多边形覆盖的栅格单元的一部分:是否有更快的方法(在 R 中)?
portion of a raster cell covered by one or more polygons: is there a faster way to do this (in R)?
图胜于言,请看
我有的是
- 一个RasterLayer对象(这里填充了随机值,仅供参考,实际值无关紧要)
- 一个包含大量多边形的 SpatialPolygons 对象
您可以使用以下代码重新创建我用于图像的示例数据:
library(sp)
library(raster)
library(rgeos)
# create example raster
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
values(r) <- sample(x=1:1000, size=150)
# create example (Spatial) Polygons
p1 <- Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE)
p2 <- Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE)
p3 <- Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE)
lots.of.polygons <- SpatialPolygons(list(Polygons(list(p1, p2, p3), 1)))
crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now)
# plot both
plot(r) #values in this raster for illustration purposes only
plot(lots.of.polygons, add=TRUE)
对于栅格中的每个像元,我想知道其中有多少被一个或多个多边形覆盖。或者实际上:栅格单元内所有多边形的面积,不包括相关单元外部的区域。如果有多个多边形与一个单元格重叠,我只需要它们的合并面积。
下面的代码可以满足我的要求,但是 运行 使用实际数据集需要一周多的时间:
# empty the example raster (don't need the values):
values(r) <- NA
# copy of r that will hold the results
r.results <- r
for (i in 1:ncell(r)){
r.cell <- r # fresh copy of the empty raster
r.cell[i] <- 1 # set the ith cell to 1
p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell
cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons
if (is.null(cropped.polygons)) {
r.results[i] <- NA # if there's no polygon intersecting this raster cell, just return NA ...
} else{
r.results[i] <- gArea(cropped.polygons) # ... otherwise return the area
}
}
plot(r.results)
plot(lots.of.polygons, add=TRUE)
我可以通过使用 sapply
而不是 for
循环来提高速度,但瓶颈似乎在其他地方。整个方法感觉很尴尬,我想知道我是否错过了一些明显的东西。起初我认为 rasterize()
应该可以很容易地做到这一点,但我不知道要在 fun=
参数中放入什么。有任何想法吗?
您可以使用 doSNOW
和 foreach
包并行化您的循环。这将通过您的 CPU 数量加快计算速度
library(doSNOW)
library(foreach)
cl <- makeCluster(4)
# 4 is the number of CPUs used. You can change that according
# to the number of processors you have
registerDoSNOW(cl)
values(r.results) <- foreach(i = 1:ncell(r), .packages = c("raster", "sp", "rgeos"), .combine = c) %dopar% {
r.cell <- r # fresh copy of the empty raster
r.cell[i] <- 1 # set the ith cell to 1
p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell
cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons
if (is.null(cropped.polygons)) {
NA # if there's no polygon intersecting this raster cell, just return NA ...
} else{
gArea(cropped.polygons) # ... otherwise return the area
}
}
plot(r.results)
plot(lots.of.polygons, add=TRUE)
[已编辑]
也许 gIntersection(..., byid = T)
和 gUnaryUnion(lots.of.polygons)
(它们使您能够一次处理所有细胞)比 for 循环更快(如果 gUnaryUnion()
花费太多时间,这是个坏主意).
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
set.seed(1); values(r) <- sample(x=1:1000, size=150)
rr <- rasterToPolygons(r)
# joining intersecting polys and put all polys into single SpatialPolygons
lots.of.polygons <- gUnaryUnion(lots.of.polygons) # in this example, it is unnecessary
gi <- gIntersection(rr, lots.of.polygons, byid = T)
ind <- as.numeric(do.call(rbind, strsplit(names(gi), " "))[,1]) # getting intersected rr's id
r[] <- NA
r[ind] <- sapply(gi@polygons, function(x) slot(x, 'area')) # a bit faster than gArea(gi, byid = T)
plot(r)
plot(lots.of.polygons, add=TRUE)
正如您在问题中提到的,另一种方法是利用光栅化来加快速度。这将涉及创建两个栅格文件:一个 "fishnet" 栅格的值对应于像元编号,另一个栅格的值对应于多边形的 ID。两者都需要 "resampled " 到比像元的原始栅格更大的分辨率。然后,您可以计算具有相同单元号的超采样渔网单元中有多少对应于具有有效(非零)id 的多边形栅格单元。在实践中,这样的事情会起作用(请注意,我稍微改变了输入多边形的构建以具有 SpatialPolygonsDataFrame
.
library(sp)
library(raster)
library(rgeos)
library(data.table)
library(gdalUtils)
# create example raster
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
values(r) <- sample(x=1:1000, size=150)
# create example (Spatial) Polygons --> Note that I changed it slightly
# to have a SpatialPolygonsDataFrame with IDs for the different polys
p1 <- Polygons(list(Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE)), "1")
p2 <- Polygons(list(Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE)), "2")
p3 <- Polygons(list(Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE)), "3")
lots.of.polygons <- SpatialPolygons(list(p1, p2, p3), 1:3)
lots.of.polygons <- SpatialPolygonsDataFrame(lots.of.polygons, data = data.frame (id = c(1,2,3)))
crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now)
# plot both
plot(r) #values in this raster for illustration purposes only
plot(lots.of.polygons, add = TRUE)
# Create a spatial grid dataframe and convert it to a "raster fishnet"
# Consider also that creating a SpatialGridDataFrame could be faster
# than using "rasterToPolygons" in your original approach !
cs <- res(r) # cell size.
cc <- c(extent(r)@xmin,extent(r)@ymin) + (cs/2) # corner of the grid.
cd <- ceiling(c(((extent(r)@xmax - extent(r)@xmin)/cs[1]), # construct grid topology
((extent(r)@ymax - extent(r)@ymin)/cs[2]))) - 1
# Define grd characteristics
grd <- GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
#transform to spatial grid dataframe. each cell has a sequential numeric id
sp_grd <- SpatialGridDataFrame(grd,
data = data.frame(id = seq(1,(prod(cd)),1)), # ids are numbers between 1 and ns*nl
proj4string = crs(r) )
# Save the "raster fishnet"
out_raster <- raster(sp_grd) %>%
setValues(sp_grd@data$id)
temprast <- tempfile(tmpdir = tempdir(), fileext = ".tif")
writeRaster(out_raster, temprast, overwrite = TRUE)
# "supersample" the raster of the cell numbers
ss_factor = 20 # this indicates how much you increase resolution of the "cells" raster
# the higher this is, the lower the error in computed percentages
temprast_hr <- tempfile(tmpdir = tempdir(), fileext = ".tif")
super_raster <- gdalwarp(temprast, temprast_hr, tr = res(r)/ss_factor, output_Raster = TRUE, overwrite = TRUE)
# Now rasterize the input polygons with same extent and resolution of super_raster
tempshapefile <- writeOGR(obj = lots.of.polygons, dsn="tempdir", layer="tempshape", driver="ESRI Shapefile")
temprastpoly <- tempfile(tmpdir = tempdir(), fileext = ".tif")
rastpoly <- gdal_rasterize(tempshapefile, temprastpoly, tr = raster::res(super_raster),
te = extent(super_raster)[c(1,3,2,4)], a = 'id', output_Raster = TRUE)
# Compute Zonal statistics: for each "value" of the supersampled fishnet raster,
# compute the number of cells which have a non-zero value in the supersampled
# polygons raster (i.e., they belong to one polygon), and divide by the maximum
# possible of cells (equal to ss_factor^2)
cell_nos <- getValues(super_raster)
polyid <- getValues(rastpoly)
rDT <- data.table(polyid_fc = as.numeric(polyid), cell_nos = as.numeric(cell_nos))
setkey(rDT, cell_nos)
# Use data.table to quickly summarize over cell numbers
count <- rDT[, lapply(.SD, FUN = function(x, na.rm = TRUE) {
100*length(which(x > 0))/(ss_factor^2)
},
na.rm = na.rm),
by = cell_nos]
# Put the results back in the SpatialGridDataFrame and plot
sp_grd@data <- data.frame(count)
sp_grd$polyid_fc[sp_grd$polyid_fc == 0] <- NA
spplot(sp_grd, zcol = 'polyid_fc')
这应该非常快,并且可以很好地适应多边形的数量。
需要注意的是,您必须处理计算百分比中的近似值!提交的错误取决于您 "supersample" 栅格的数量(此处由 ss_factor
变量设置为 20)。更高的超级采样因子导致更低的错误,但更大的内存需求和处理时间。
我也在想,加快 "vector based" 方法的一种方法可能是对栅格单元和不同多边形之间的距离进行先验分析,这样您就可以只看用于单元格和 "nearby" 多边形之间的相交。也许您可以使用多边形的 bboxes 来寻找有趣的单元格....
HTH,
洛伦佐
图胜于言,请看
我有的是
- 一个RasterLayer对象(这里填充了随机值,仅供参考,实际值无关紧要)
- 一个包含大量多边形的 SpatialPolygons 对象
您可以使用以下代码重新创建我用于图像的示例数据:
library(sp)
library(raster)
library(rgeos)
# create example raster
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
values(r) <- sample(x=1:1000, size=150)
# create example (Spatial) Polygons
p1 <- Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE)
p2 <- Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE)
p3 <- Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE)
lots.of.polygons <- SpatialPolygons(list(Polygons(list(p1, p2, p3), 1)))
crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now)
# plot both
plot(r) #values in this raster for illustration purposes only
plot(lots.of.polygons, add=TRUE)
对于栅格中的每个像元,我想知道其中有多少被一个或多个多边形覆盖。或者实际上:栅格单元内所有多边形的面积,不包括相关单元外部的区域。如果有多个多边形与一个单元格重叠,我只需要它们的合并面积。
下面的代码可以满足我的要求,但是 运行 使用实际数据集需要一周多的时间:
# empty the example raster (don't need the values):
values(r) <- NA
# copy of r that will hold the results
r.results <- r
for (i in 1:ncell(r)){
r.cell <- r # fresh copy of the empty raster
r.cell[i] <- 1 # set the ith cell to 1
p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell
cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons
if (is.null(cropped.polygons)) {
r.results[i] <- NA # if there's no polygon intersecting this raster cell, just return NA ...
} else{
r.results[i] <- gArea(cropped.polygons) # ... otherwise return the area
}
}
plot(r.results)
plot(lots.of.polygons, add=TRUE)
我可以通过使用 sapply
而不是 for
循环来提高速度,但瓶颈似乎在其他地方。整个方法感觉很尴尬,我想知道我是否错过了一些明显的东西。起初我认为 rasterize()
应该可以很容易地做到这一点,但我不知道要在 fun=
参数中放入什么。有任何想法吗?
您可以使用 doSNOW
和 foreach
包并行化您的循环。这将通过您的 CPU 数量加快计算速度
library(doSNOW)
library(foreach)
cl <- makeCluster(4)
# 4 is the number of CPUs used. You can change that according
# to the number of processors you have
registerDoSNOW(cl)
values(r.results) <- foreach(i = 1:ncell(r), .packages = c("raster", "sp", "rgeos"), .combine = c) %dopar% {
r.cell <- r # fresh copy of the empty raster
r.cell[i] <- 1 # set the ith cell to 1
p <- rasterToPolygons(r.cell) # create a polygon that represents the i-th raster cell
cropped.polygons <- gIntersection(p, lots.of.polygons) # intersection of i-th raster cell and all SpatialPolygons
if (is.null(cropped.polygons)) {
NA # if there's no polygon intersecting this raster cell, just return NA ...
} else{
gArea(cropped.polygons) # ... otherwise return the area
}
}
plot(r.results)
plot(lots.of.polygons, add=TRUE)
也许 gIntersection(..., byid = T)
和 gUnaryUnion(lots.of.polygons)
(它们使您能够一次处理所有细胞)比 for 循环更快(如果 gUnaryUnion()
花费太多时间,这是个坏主意).
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
set.seed(1); values(r) <- sample(x=1:1000, size=150)
rr <- rasterToPolygons(r)
# joining intersecting polys and put all polys into single SpatialPolygons
lots.of.polygons <- gUnaryUnion(lots.of.polygons) # in this example, it is unnecessary
gi <- gIntersection(rr, lots.of.polygons, byid = T)
ind <- as.numeric(do.call(rbind, strsplit(names(gi), " "))[,1]) # getting intersected rr's id
r[] <- NA
r[ind] <- sapply(gi@polygons, function(x) slot(x, 'area')) # a bit faster than gArea(gi, byid = T)
plot(r)
plot(lots.of.polygons, add=TRUE)
正如您在问题中提到的,另一种方法是利用光栅化来加快速度。这将涉及创建两个栅格文件:一个 "fishnet" 栅格的值对应于像元编号,另一个栅格的值对应于多边形的 ID。两者都需要 "resampled " 到比像元的原始栅格更大的分辨率。然后,您可以计算具有相同单元号的超采样渔网单元中有多少对应于具有有效(非零)id 的多边形栅格单元。在实践中,这样的事情会起作用(请注意,我稍微改变了输入多边形的构建以具有 SpatialPolygonsDataFrame
.
library(sp)
library(raster)
library(rgeos)
library(data.table)
library(gdalUtils)
# create example raster
r <- raster(nrows=10, ncol=15, xmn=0, ymn=0)
values(r) <- sample(x=1:1000, size=150)
# create example (Spatial) Polygons --> Note that I changed it slightly
# to have a SpatialPolygonsDataFrame with IDs for the different polys
p1 <- Polygons(list(Polygon(coords=matrix(c(50, 100, 100, 50, 50, 15, 15, 35, 35, 15), nrow=5, ncol=2), hole=FALSE)), "1")
p2 <- Polygons(list(Polygon(coords=matrix(c(77, 123, 111, 77, 43, 57, 66, 43), nrow=4, ncol=2), hole=FALSE)), "2")
p3 <- Polygons(list(Polygon(coords=matrix(c(110, 125, 125, 110, 67, 75, 80, 67), nrow=4, ncol=2), hole=FALSE)), "3")
lots.of.polygons <- SpatialPolygons(list(p1, p2, p3), 1:3)
lots.of.polygons <- SpatialPolygonsDataFrame(lots.of.polygons, data = data.frame (id = c(1,2,3)))
crs(lots.of.polygons) <- crs(r) # copy crs from raster to polygons (please ignore any potential problems related to projections etc. for now)
# plot both
plot(r) #values in this raster for illustration purposes only
plot(lots.of.polygons, add = TRUE)
# Create a spatial grid dataframe and convert it to a "raster fishnet"
# Consider also that creating a SpatialGridDataFrame could be faster
# than using "rasterToPolygons" in your original approach !
cs <- res(r) # cell size.
cc <- c(extent(r)@xmin,extent(r)@ymin) + (cs/2) # corner of the grid.
cd <- ceiling(c(((extent(r)@xmax - extent(r)@xmin)/cs[1]), # construct grid topology
((extent(r)@ymax - extent(r)@ymin)/cs[2]))) - 1
# Define grd characteristics
grd <- GridTopology(cellcentre.offset = cc, cellsize = cs, cells.dim = cd)
#transform to spatial grid dataframe. each cell has a sequential numeric id
sp_grd <- SpatialGridDataFrame(grd,
data = data.frame(id = seq(1,(prod(cd)),1)), # ids are numbers between 1 and ns*nl
proj4string = crs(r) )
# Save the "raster fishnet"
out_raster <- raster(sp_grd) %>%
setValues(sp_grd@data$id)
temprast <- tempfile(tmpdir = tempdir(), fileext = ".tif")
writeRaster(out_raster, temprast, overwrite = TRUE)
# "supersample" the raster of the cell numbers
ss_factor = 20 # this indicates how much you increase resolution of the "cells" raster
# the higher this is, the lower the error in computed percentages
temprast_hr <- tempfile(tmpdir = tempdir(), fileext = ".tif")
super_raster <- gdalwarp(temprast, temprast_hr, tr = res(r)/ss_factor, output_Raster = TRUE, overwrite = TRUE)
# Now rasterize the input polygons with same extent and resolution of super_raster
tempshapefile <- writeOGR(obj = lots.of.polygons, dsn="tempdir", layer="tempshape", driver="ESRI Shapefile")
temprastpoly <- tempfile(tmpdir = tempdir(), fileext = ".tif")
rastpoly <- gdal_rasterize(tempshapefile, temprastpoly, tr = raster::res(super_raster),
te = extent(super_raster)[c(1,3,2,4)], a = 'id', output_Raster = TRUE)
# Compute Zonal statistics: for each "value" of the supersampled fishnet raster,
# compute the number of cells which have a non-zero value in the supersampled
# polygons raster (i.e., they belong to one polygon), and divide by the maximum
# possible of cells (equal to ss_factor^2)
cell_nos <- getValues(super_raster)
polyid <- getValues(rastpoly)
rDT <- data.table(polyid_fc = as.numeric(polyid), cell_nos = as.numeric(cell_nos))
setkey(rDT, cell_nos)
# Use data.table to quickly summarize over cell numbers
count <- rDT[, lapply(.SD, FUN = function(x, na.rm = TRUE) {
100*length(which(x > 0))/(ss_factor^2)
},
na.rm = na.rm),
by = cell_nos]
# Put the results back in the SpatialGridDataFrame and plot
sp_grd@data <- data.frame(count)
sp_grd$polyid_fc[sp_grd$polyid_fc == 0] <- NA
spplot(sp_grd, zcol = 'polyid_fc')
这应该非常快,并且可以很好地适应多边形的数量。
需要注意的是,您必须处理计算百分比中的近似值!提交的错误取决于您 "supersample" 栅格的数量(此处由 ss_factor
变量设置为 20)。更高的超级采样因子导致更低的错误,但更大的内存需求和处理时间。
我也在想,加快 "vector based" 方法的一种方法可能是对栅格单元和不同多边形之间的距离进行先验分析,这样您就可以只看用于单元格和 "nearby" 多边形之间的相交。也许您可以使用多边形的 bboxes 来寻找有趣的单元格....
HTH,
洛伦佐