在 purrr:::pmap 中打印当前 data.frame 行也就是进度

Question

我试图将 purrr and I'm currently working with pmap. pmap can be used to call a predefined function and uses the values in a dataframe 理解为函数调用的参数。我想知道当前状态是什么，因为我的 data.frames 可能有几千行。

如何打印 pmap 在运行上的当前行，可能连同 data.frame 的总长度？

我尝试在 for 循环中包含一个计数器，还尝试使用

捕获当前行

current <- data.frame(...)

然后是row.names(current)

（这里的想法：https://blog.az.sg/posts/map-and-walk/）

但在这两种情况下它总是打印 1。

感谢您的帮助。

为了重现性，让我们使用将我带到 purrr:::pmap () 的问题中的代码：

library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris)  # let's assume these are different input tables

# the data.frame with the values for the function
hyper_grid <- expand.grid(
  Input_table = names(Input_list),
  Trees = c(10, 20),
  Importance = c("none", "impurity"),
  Classification = TRUE,
  Repeats = 1:5,
  Target = "Species")

# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  RF_train <- ranger(
    dependent.variable.name = Target, 
    data = Input_list[[Input_table]],  # referring to the named object in the list
    num.trees = Trees, 
    importance = Importance, 
    classification = Classification)  # otherwise regression is performed

  data.frame(Prediction_error = RF_train$prediction.error,
             True_positive = RF_train$confusion.matrix[1])
}

# the pmap call using a row of hyper_grid and the function in parallel
hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)

我尝试了两件事：

counter <- 0
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  counter <- counter + 1
  print(paste(counter, "of", nrow(hyper_grid)))
  # rest of the function
}

# and also 
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
  current <- data.frame(...)
  print(paste(row.names(current), "of", nrow(hyper_grid)))
  # rest of the function
}

# both return
> hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
[1] "1 of 40"
[1] "1 of 40"
[1] "1 of 40"
...

Answer 1

由于您已经在使用 pmap，最简单的方法就是也传递行名。

你可以做类似

hyper_grid$res <- purrr::pmap(cbind(hyper_grid, .row=rownames(hyper_grid)), fit_and_extract_metrics)

这只是添加一个带有行名称的 .row 向量。然后在你的迭代函数中，你可以做

fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ..., .row) {
  print(paste(.row, "of", nrow(hyper_grid)))
  # rest of the function
}

请注意，我向函数添加了一个 .row 参数以捕获我们添加的新列。

请注意，map() 和 walk() 的版本称为 imap() 和 iwalk()，可以更轻松地获取迭代器，但 pmap 没有有一个 ipmap 大概是因为您必须完成构建参数列表的所有工作，因此也可以传入您想要的名称或索引。

在 purrr:::pmap 中打印当前 data.frame 行也就是进度

Print current data.frame line aka progress in purrr:::pmap

r

pmap

dataframe

purrr