如何在 R 中与 lidR 并行处理 LAScatalog

How to process a LAScatalog in parallel with lidR in R

我曾经使用以下代码处理 LIDAR 目录(使用伟大的 lidR 包中的 LAScatalog 处理引擎):

library(lidR)

lasdir <- "D:\LAS\"
output <- "D:\LAS\PRODUCTS\"
epsg = "+init=epsg:25829"
res = 1

no_cores <- detectCores()
cat <- lascatalog(lasdir = lasdir, 
                  outputdir = output, 
                  pattern = '*COL.laz$|*COL.LAZ$',
                  catname = "Catalog",
                  clipcat = FALSE, clipcatbuf = FALSE, clipbuf = 1000, clipcatshape = clipcatshape,
                  cat_chunk_buffer = 20,
                  cores = no_cores, progress = TRUE,
                  laz_compression = TRUE, epsg = epsg,
                  retilecatalog = FALSE, tile_chunk_buffer = 10,
                  tile_chunk_size = 1000,
                  filterask = FALSE,
                  filter = "-keep_first -drop_z_below 2")

DEM_output <- paste0(output,"DEM_", str_pad(res, 3, "left", pad = "0"), "/")
opt_output_files(cat) <- paste0(DEM_output,"{ORIGINALFILENAME}") #set filepaths
DEM <- grid_terrain(cat, res = res, algorithm = "knnidw"(k = 5, p = 2)) 

库实现了一些,现在,参数 cores 似乎不起作用,尽管该过程有效,但现在它不能并行工作。一条消息指出:Option no longer supported. See ?lidR-parallelism.

现在如何并行处理目录?

lidR 2.1.0(2019 年 7 月)起,opt_core() 函数已被弃用。见 changelog.

The strategy used to process the tiles in parallel must now be explicitly declared by users. This is anyway how it should have been designed from the beginning! For users, restoring the exact former behavior implies only one change.

In versions < 2.1.0 the following was correct:

library(lidR)
ctg <- catalog("folder/")
opt_cores(ctg) <- 4L
hmean <- grid_metrics(ctg, mean(Z))

In versions >= 2.1.0 this must be explicitly declared with the future package:

library(lidR)
library(future)
plan(multisession)
ctg <- catalog("folder/")
hmean <- grid_metrics(ctg, mean(Z))

此外,这在名为 lidR-parallelism 的手册页中有完整记录。

?lidR::`lidR-parallelism`

chunk-based parallelism

When processing a LAScatalog, the internal engine splits the dataset into chunks and each chunk is read and processed sequentially in a loop. But actually this loop can be parallelized with the future package. By defaut the chunks are processed sequentially, but they can be processed in parallel by registering an evaluation strategy. For example, the following code is evaluated sequentially:

ctg <- readLAScatalog("folder/")
out <- grid_metrics(ctg, mean(Z))

But this one is evaluated in parallel with two cores:

library(future)
plan(multisession, workers = 2L)
ctg <- readLAScatalog("folder/")
out <- grid_metrics(ctg, mean(Z))

With chunk-based parallelism any algorithm can be parallelized by processing several subsets of a dataset [...]

要充分利用这种新语法,您需要了解 future 的工作原理。参见 future