R: lapply 函数 - 跳过当前函数循环

Question

我正在对多个文件的列表使用 lapply 函数。有没有一种方法可以在不返回任何内容的情况下跳过当前文件上的函数，只跳到文件列表中的下一个文件？

准确地说，我有一个检查条件的 if 语句，如果语句 returns FALSE，我想跳到下一个文件。

Answer 1

您可以定义要在调用 lapply() 时使用的自定义函数。这是一些示例代码，它遍历文件列表并仅在名称不包含数字 3 时才处理文件（有点做作，但希望这能说明问题）：

files <- as.list(c("file1.txt", "file2.txt", "file3.txt"))

fun <- function(x) {
    test <- grep("3", x)                     // check for files with "3" in their name
    if (length(test) == 0) {                 // replace with your statement here
        // process the file here
    }
    // otherwise do not process the file
}

result <- lapply(files, function(x) fun(x))  // call lapply with custom function

Answer 2

lapply 将始终 return 一个与提供的 X 长度相同的列表。您可以简单地将项目设置为稍后可以过滤掉的内容。

例如，如果您有函数 parsefile

parsefile <-function(x) {
  if(x>=0) {
    x
  } else {
    NULL
  }
}

编辑： { 如 Florent Angly 所示，您应该将 NULL 替换为 NA}

你运行它在向量上runif(10,-5,5)

result<-lapply(runif(10,-5,5), parsefile)

然后您的列表中将充满答案和 NULLs

您可以通过执行...

来对 NULL 进行子集化

result[!vapply(result, is.null, logical(1))]

Answer 3

正如其他人已经回答的那样，我不认为您可以在不使用 *apply 系列函数返回某些内容的情况下继续下一次迭代。

在这种情况下，我使用 Dean MacGregor 的方法，但有一个小改动：我使用 NA 而不是 NULL，这使得筛选结果更容易。

files <- list("file1.txt", "file2.txt", "file3.txt")

parse_file <- function(file) {
  if(file.exists(file)) {
    readLines(file)
  } else {
    NA
  }
}

results <- lapply(files, parse_file)
results <- results[!is.na(results)]

快速基准测试

res_na   <- list("a",   NA, "c")
res_null <- list("a", NULL, "c")
microbenchmark::microbenchmark(
  na = res_na[!is.na(res_na)],
  null = res_null[!vapply(res_null, is.null, logical(1))]
)

说明 NA 解决方案比使用 NULL:

的解决方案快很多

Unit: nanoseconds
expr  min   lq    mean median   uq   max neval
  na    0    1  410.78    446  447  5355   100
null 3123 3570 5283.72   3570 4017 75861   100

R: lapply 函数 - 跳过当前函数循环

R: lapply function - skipping the current function loop

r

function

lapply