如何在创建栅格的for循环中添加平均栅格? R

how to add average rasters within for-loop that creates the rasters? R

我有几个包含 700 多个二进制编码栅格的目录,我对每个目录的输出栅格进行平均。但是,我目前在 for 循环中 1 乘 1 创建栅格,然后将新创建的栅格加载回 R 以求和以获得月降雨量总量。

但是,由于我不需要单个栅格,只需要平均栅格,我有一种预感,我可以完成所有这些 w/in 1 循环而不是保存栅格,而只保存输出平均栅格,但是我对如何在 R 中对此进行编程感到很吃力。

setwd("~/Desktop/CMORPH/Levant-Clip/200001")

dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)

for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
  y<-matrix((data=data), ncol=1649, nrow=4948)
  r <- raster(y)
  e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 
  tr <- t(r) #transpose 
  re <- setExtent(tr,extent(e)) ### set the extent to the raster
  ry <- flip(re, direction = 'y')
  projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
  C_Lev <- crop(ry, Levant) ### Clip to Levant
  M_C_Lev<-mask(C_Lev, Levant)
  writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original
}
# 
raspath <- list.files ('~/Desktop/CMORPH/Levant-Clip/200001',pattern="*.tif",     full.names=T, recursive=T)
rasstk <- stack(raspath)
sum200001<-sum(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

目前,这段代码执行大约需要 75 分钟,我还有大约 120 个目录要访问,我正在寻找更快的解决方案。

感谢您的所有评论和意见。最好的,埃文

详细说明我之前的评论,你可以试试:

setwd("~/Desktop/CMORPH/Levant-Clip/200001")

dir.output <- '~/Desktop/CMORPH/Levant-Clip/200001' ### change as needed to give output location
path <- list.files("~/Desktop/CMORPH/MonthlyCMORPH/200001",pattern="*.bz2", full.names=T, recursive=T)
raster_list = list()
for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs
  y<-matrix((data=data), ncol=1649, nrow=4948)
  r <- raster(y)
  if (i == 1) {
    e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 

  }
  tr <- t(r) #transpose 
  re <- setExtent(tr,extent(e)) ### set the extent to the raster
  ry <- flip(re, direction = 'y')
  projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
  C_Lev <- crop(ry, Levant) ### Clip to Levant
  M_C_Lev<-mask(C_Lev, Levant)
  raster_list[[i]] = M_C_Lev
}
# 

rasstk <- stack(raster_list, quick = TRUE) # OR rasstk <- brick(raster_list, quick = TRUE)
avg200001<-mean(rasstk)
writeRaster(avg200001, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

使用 stack 中的 "quick" 选项绝对可以加快速度,尤其是当您有很多光栅时。

另一种可能是先计算平均值,然后再执行"spatial proceesing"。例如:

for (i in 1:length(path)) {
  files = bzfile(path[i], "rb")
  data <- readBin(files,what="double",endian = "little", n = 4948*1649, size=4) #Mode of the vector to be read
  data[data == -999] <- NA #covert missing data from -999(CMORPH notation) to NAs

  if (i == 1) {
   totdata  <-  data 
   num_nonNA <- as.numeric(!is.na(data))
  } else {
totdata = rowSums(cbind(totdata,data), na.rm = TRUE)
# We have to count the number of "valid" entries so that the average is correct !
num_nonNA = rowSums(cbind(num_nonNA,as.numeric(!is.na(data))),na.rm = TRUE)
  }
}

avg_data = totdata/num_nonNA # Compute the average

# Now do the "spatial" processing

y<-matrix(avg_data, ncol=1649, nrow=4948)
r <- raster(y)
e <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info 
tr <- t(r) #transpose 
re <- setExtent(tr,extent(e)) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"
C_Lev <- crop(avg_data, Levant) ### Clip to Levant
M_C_Lev<-mask(C_Lev, Levant)
writeRaster(M_C_Lev, paste(dir.output, basename(path[i]), sep = ''), format = 'GTiff', overwrite = T) ###the basename allows the file to be named the same as the original

这可能会更快或更慢,具体取决于 "how much" 您正在裁剪原始数据。

HTH,

洛伦佐

我添加了另一个答案来澄清和简化一些事情,也与聊天中的评论有关。下面的代码应该按照您的要求进行:即循环文件,读取 "data",计算所有文件的总和并将其转换为具有指定尺寸的栅格。

请注意,出于此处的测试目的,我将您的文件名循环替换为简单的 1 到 720 循环,并通过创建与您的长度相同的数组来读取文件填充值从 1 到 4 和一些 NA !

totdata <- array(dim = 4948*1649)  # Define Dummy array
for (i in 1:720) {
  message("Working on file: ", i)
  data <- array(rep(c(1,2,3,4),4948*1649/4), dim = 4948*1649) # Create a "fake" 4948*1649 array  each time to simulate data reading
  data[1:1000] <- -999   # Set some values to NA
  data[data == -999] <- NA #convert missing data from -999

  totdata <- rowSums(cbind(totdata, data), na.rm = T)   # Let's sum the current array with the cumulative sum so far
}

# Now reshape to matrix and convertt to raster, etc.
y  <- matrix(totdata, ncol=1649, nrow=4948)
r  <- raster(y)
e  <- extent(-180, 180, -90, 83.6236) ### choose the extent based on the netcdf file info
tr <- t(r) #transpose
re <- setExtent(tr,e) ### set the extent to the raster
ry <- flip(re, direction = 'y')
projection(ry) <- "+proj=longlat +datum=WGS84 +ellps=WGS84"

这会生成一个 "proper" 光栅:

> ry
class       : RasterLayer 
dimensions  : 1649, 4948, 8159252  (nrow, ncol, ncell)
resolution  : 0.07275667, 0.1052902  (x, y)
extent      : -180, 180, -90, 83.6236  (xmin, xmax, ymin, ymax)
coord. ref. : +proj=longlat +datum=WGS84 +ellps=WGS84 +towgs84=0,0,0 
data source : in memory
names       : layer 
values      : 0, 2880  (min, max)

包含不同数组的总和:您可以注意到最大值为 720 * 4 = 2880(唯一需要注意的是:如果您的单元格始终位于 NA,您将得到 0 而不是不适用)

在我的笔记本电脑上,运行大约需要 5 分钟!

实践中:

  1. 为了避免内存问题,我没有在内存中读取所有数据。 你的每个阵列或多或少是 64MB,所以我无法加载它们 然后求和(除非我有 50 GB 的 RAM 可以扔掉——甚至在 那样的话会很慢)。我改为使用联想 通过在每个处计算 "cumulative" 总和来求和的性质 循环。通过这种方式,您只需要使用两个 800 万个数组 一次:您从文件 "i" 中读取的那个,以及包含 当前总和。
  2. 这里为了避免不必要的计算,直接求和 我从读取二进制文件得到的一维数组。你不需要 重塑以矩阵循环中的数组,因为你可以这样做 在最终的 "summed" 数组上,然后您可以将其转换为矩阵形式

我希望这对你有用,并且我没有遗漏一些明显的东西!

据我所知,如果使用这种方法仍然很慢,那么您在其他地方遇到了问题(例如在数据读取方面:在 720 个文件上,读取每个文件花费 3 秒意味着大约需要 35 分钟的处理时间) .

HTH,

洛伦佐