在 purrr:::pmap 中打印当前 data.frame 行也就是进度
Print current data.frame line aka progress in purrr:::pmap
我试图将 purrr and I'm currently working with pmap. pmap
can be used to call a predefined function and uses the values in a dataframe 理解为函数调用的参数。我想知道当前状态是什么,因为我的 data.frames 可能有几千行。
如何打印 pmap
在 运行 上的当前行,可能连同 data.frame 的总长度?
我尝试在 for 循环中包含一个计数器,还尝试使用
捕获当前行
current <- data.frame(...)
然后是row.names(current)
(这里的想法:https://blog.az.sg/posts/map-and-walk/)
但在这两种情况下它总是打印 1
。
感谢您的帮助。
为了重现性,让我们使用将我带到 purrr:::pmap
() 的问题中的代码:
library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
Trees = c(10, 20),
Importance = c("none", "impurity"),
Classification = TRUE,
Repeats = 1:5,
Target = "Species")
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
RF_train <- ranger(
dependent.variable.name = Target,
data = Input_list[[Input_table]], # referring to the named object in the list
num.trees = Trees,
importance = Importance,
classification = Classification) # otherwise regression is performed
data.frame(Prediction_error = RF_train$prediction.error,
True_positive = RF_train$confusion.matrix[1])
}
# the pmap call using a row of hyper_grid and the function in parallel
hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
我尝试了两件事:
counter <- 0
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
counter <- counter + 1
print(paste(counter, "of", nrow(hyper_grid)))
# rest of the function
}
# and also
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
current <- data.frame(...)
print(paste(row.names(current), "of", nrow(hyper_grid)))
# rest of the function
}
# both return
> hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
[1] "1 of 40"
[1] "1 of 40"
[1] "1 of 40"
...
由于您已经在使用 pmap,最简单的方法就是也传递行名。
你可以做类似
hyper_grid$res <- purrr::pmap(cbind(hyper_grid, .row=rownames(hyper_grid)), fit_and_extract_metrics)
这只是添加一个带有行名称的 .row
向量。然后在你的迭代函数中,你可以做
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ..., .row) {
print(paste(.row, "of", nrow(hyper_grid)))
# rest of the function
}
请注意,我向函数添加了一个 .row
参数以捕获我们添加的新列。
请注意,map()
和 walk()
的版本称为 imap()
和 iwalk()
,可以更轻松地获取迭代器,但 pmap
没有有一个 ipmap
大概是因为您必须完成构建参数列表的所有工作,因此也可以传入您想要的名称或索引。
我试图将 purrr and I'm currently working with pmap. pmap
can be used to call a predefined function and uses the values in a dataframe 理解为函数调用的参数。我想知道当前状态是什么,因为我的 data.frames 可能有几千行。
如何打印 pmap
在 运行 上的当前行,可能连同 data.frame 的总长度?
我尝试在 for 循环中包含一个计数器,还尝试使用
捕获当前行current <- data.frame(...)
然后是row.names(current)
(这里的想法:https://blog.az.sg/posts/map-and-walk/)
但在这两种情况下它总是打印 1
。
感谢您的帮助。
为了重现性,让我们使用将我带到 purrr:::pmap
(
library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
Trees = c(10, 20),
Importance = c("none", "impurity"),
Classification = TRUE,
Repeats = 1:5,
Target = "Species")
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
RF_train <- ranger(
dependent.variable.name = Target,
data = Input_list[[Input_table]], # referring to the named object in the list
num.trees = Trees,
importance = Importance,
classification = Classification) # otherwise regression is performed
data.frame(Prediction_error = RF_train$prediction.error,
True_positive = RF_train$confusion.matrix[1])
}
# the pmap call using a row of hyper_grid and the function in parallel
hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
我尝试了两件事:
counter <- 0
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
counter <- counter + 1
print(paste(counter, "of", nrow(hyper_grid)))
# rest of the function
}
# and also
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ...) {
current <- data.frame(...)
print(paste(row.names(current), "of", nrow(hyper_grid)))
# rest of the function
}
# both return
> hyper_grid$res <- purrr::pmap(hyper_grid, fit_and_extract_metrics)
[1] "1 of 40"
[1] "1 of 40"
[1] "1 of 40"
...
由于您已经在使用 pmap,最简单的方法就是也传递行名。
你可以做类似
hyper_grid$res <- purrr::pmap(cbind(hyper_grid, .row=rownames(hyper_grid)), fit_and_extract_metrics)
这只是添加一个带有行名称的 .row
向量。然后在你的迭代函数中,你可以做
fit_and_extract_metrics <- function(Target, Input_table, Trees, Importance, Classification, ..., .row) {
print(paste(.row, "of", nrow(hyper_grid)))
# rest of the function
}
请注意,我向函数添加了一个 .row
参数以捕获我们添加的新列。
请注意,map()
和 walk()
的版本称为 imap()
和 iwalk()
,可以更轻松地获取迭代器,但 pmap
没有有一个 ipmap
大概是因为您必须完成构建参数列表的所有工作,因此也可以传入您想要的名称或索引。