当使用 expand.grid 和 purrr::pmap 时,R ranger confusion.matrix 比预期的要大
R ranger confusion.matrix is larger than supposed when using expand.grid and purrr::pmap
抱歉今天所有与 purrr 相关的问题,仍在努力弄清楚如何有效地使用它。
因此,在 SO 的帮助下,我设法根据来自 data.frame 的输入值获得了随机森林管理员模型 运行。这是使用 purrr::pmap
完成的。但是,我不明白 return 值是如何从被调用函数生成的。考虑这个例子:
library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species")
> hyper_grid
Input_table mtry Classification Target
1 iris1 1 TRUE Species
2 iris2 1 TRUE Species
3 iris1 2 TRUE Species
4 iris2 2 TRUE Species
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Classification, mtry,...) {
RF_train <- ranger(
dependent.variable.name = Target,
mtry = mtry,
data = Input_list[[Input_table]], # referring to the named object in the list
classification = Classification) # otherwise regression is performed
RF_train$confusion.matrix
}
# the pmap call using a row of hyper_grid and the function in parallel
purrr::pmap(hyper_grid, fit_and_extract_metrics)
它应该是 return 4 倍的 3*3 混淆矩阵,因为 iris$Species
中有 3 个级别,而不是 return 的巨大混淆矩阵。谁能给我解释一下这是怎么回事?
第一行:
> purrr::pmap(hyper_grid, fit_and_extract_metrics)
[[1]]
predicted
true 4.4 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4
4.3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.6 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.7 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.8 0 0 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.9 0 0 1 2 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5 0 0 0 1 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.1 0 0 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 0 0
这里的问题是因为传递给函数的参数是级别,而不是字符。这触发了 ranger 函数。要解决这个问题,您需要做的就是在 expand.grid
:
中设置 stringsAsFactors = FALSE
hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species", stringsAsFactors = FALSE)
您将获得:
[[1]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 46 4
virginica 0 4 46
[[2]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 46 4
virginica 0 5 45
[[3]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47
[[4]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47
抱歉今天所有与 purrr 相关的问题,仍在努力弄清楚如何有效地使用它。
因此,在 SO 的帮助下,我设法根据来自 data.frame 的输入值获得了随机森林管理员模型 运行。这是使用 purrr::pmap
完成的。但是,我不明白 return 值是如何从被调用函数生成的。考虑这个例子:
library(ranger)
data(iris)
Input_list <- list(iris1 = iris, iris2 = iris) # let's assume these are different input tables
# the data.frame with the values for the function
hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species")
> hyper_grid
Input_table mtry Classification Target
1 iris1 1 TRUE Species
2 iris2 1 TRUE Species
3 iris1 2 TRUE Species
4 iris2 2 TRUE Species
# the function to be called for each row of the `hyper_grid`df
fit_and_extract_metrics <- function(Target, Input_table, Classification, mtry,...) {
RF_train <- ranger(
dependent.variable.name = Target,
mtry = mtry,
data = Input_list[[Input_table]], # referring to the named object in the list
classification = Classification) # otherwise regression is performed
RF_train$confusion.matrix
}
# the pmap call using a row of hyper_grid and the function in parallel
purrr::pmap(hyper_grid, fit_and_extract_metrics)
它应该是 return 4 倍的 3*3 混淆矩阵,因为 iris$Species
中有 3 个级别,而不是 return 的巨大混淆矩阵。谁能给我解释一下这是怎么回事?
第一行:
> purrr::pmap(hyper_grid, fit_and_extract_metrics)
[[1]]
predicted
true 4.4 4.7 4.8 4.9 5 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 6.2 6.3 6.4
4.3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.4 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.5 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.6 0 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.7 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.8 0 0 1 3 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
4.9 0 0 1 2 2 0 0 0 0 0 0 0 0 0 1 0 0 0 0
5 0 0 0 1 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0
5.1 0 0 0 0 0 8 0 0 0 1 0 0 0 0 0 0 0 0 0
这里的问题是因为传递给函数的参数是级别,而不是字符。这触发了 ranger 函数。要解决这个问题,您需要做的就是在 expand.grid
:
stringsAsFactors = FALSE
hyper_grid <- expand.grid(
Input_table = names(Input_list),
mtry = c(1,2),
Classification = TRUE,
Target = "Species", stringsAsFactors = FALSE)
您将获得:
[[1]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 46 4
virginica 0 4 46
[[2]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 46 4
virginica 0 5 45
[[3]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47
[[4]]
predicted
true setosa versicolor virginica
setosa 50 0 0
versicolor 0 47 3
virginica 0 3 47