如何在管道中重命名 mlr3 任务特征值
How to rename mlr3 task feature values within pipeline
我有一个 mlr3 任务
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
如何在管道中将功能“v1”的值“a”重命名为值“c”?
代码:
library(mlr3)
library(mlr3pipelines)
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
给出输出(见第 v1 列,第 2 行):
data v1 v2
1 3.15 c 1
2 4.11 2 2
3 3.56 c 2
但需要输出
data v1 v2
1 3.15 c 1
2 4.11 b 2
3 3.56 c 2
使用 PipeOpColApply
.
非常简单
我们需要定义一个函数,它将接受提供的输入并执行请求的操作(应用程序)。
library(mlr3)
library(mlr3pipelines)
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
我们还需要定义函数将在哪些列上运行:
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
#output
data v1 v2
1: 3.15 c 1
2: 4.11 b 2
3: 3.56 c 2
这与函数帮助中的示例非常相似。
数据:
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
task <- TaskRegr$new("bmsp", df, target = "data")
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlr3pipelines_0.3.0-9000 mlr3_0.7.0 Biostrings_2.56.0 XVector_0.28.0 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] Biobase_2.48.0 httr_1.4.2 bit64_4.0.5 splines_4.0.2 foreach_1.5.0 prodlim_2019.11.13 assertthat_0.2.1 lgr_0.3.4 askpass_1.1
[10] BiocFileCache_1.12.1 blob_1.2.1 mlr3misc_0.5.0 progress_1.2.2 ipred_0.9-9 backports_1.1.10 pillar_1.4.6 RSQLite_2.2.1 lattice_0.20-41
[19] glue_1.4.2 uuid_0.1-4 pROC_1.16.2 digest_0.6.25 checkmate_2.0.0 colorspace_1.4-1 recipes_0.1.13 Matrix_1.2-18 plyr_1.8.6
[28] timeDate_3043.102 XML_3.99-0.5 pkgconfig_2.0.3 biomaRt_2.44.1 caret_6.0-86 zlibbioc_1.34.0 purrr_0.3.4 scales_1.1.1 gower_0.2.2
[37] lava_1.6.8 tibble_3.0.3 openssl_1.4.3 generics_0.0.2 ggplot2_3.3.2 ellipsis_0.3.1 withr_2.3.0 nnet_7.3-14 paradox_0.4.0-9000
[46] survival_3.1-12 magrittr_1.5 crayon_1.3.4 memoise_1.1.0 nlme_3.1-148 MASS_7.3-51.6 class_7.3-17 tools_4.0.2 data.table_1.13.0
[55] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 glmnet_4.0-2 AnnotationDbi_1.50.3 compiler_4.0.2 tinytex_0.26
[64] rlang_0.4.7 grid_4.0.2 iterators_1.0.12 rstudioapi_0.11 rappdirs_0.3.1 gtable_0.3.0 ModelMetrics_1.2.2.2 codetools_0.2-16 DBI_1.1.0
[73] curl_4.3 reshape2_1.4.4 R6_2.4.1 lubridate_1.7.9 dplyr_1.0.2 bit_4.0.4 biomartr_0.9.2 shape_1.4.5 stringi_1.5.3
[82] Rcpp_1.0.5 vctrs_0.3.4 rpart_4.1-15 dbplyr_1.4.4 tidyselect_1.1.0 xfun_0.18
我有一个 mlr3 任务
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
如何在管道中将功能“v1”的值“a”重命名为值“c”?
代码:
library(mlr3)
library(mlr3pipelines)
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
给出输出(见第 v1 列,第 2 行):
data v1 v2
1 3.15 c 1
2 4.11 2 2
3 3.56 c 2
但需要输出
data v1 v2
1 3.15 c 1
2 4.11 b 2
3 3.56 c 2
使用 PipeOpColApply
.
我们需要定义一个函数,它将接受提供的输入并执行请求的操作(应用程序)。
library(mlr3)
library(mlr3pipelines)
pop <- po("colapply",
applicator = function(x) ifelse(x == "a", "c", x))
我们还需要定义函数将在哪些列上运行:
pop$param_set$values$affect_columns = selector_name("v1")
pop$train(list(task))[[1]]$data()
#output
data v1 v2
1: 3.15 c 1
2: 4.11 b 2
3: 3.56 c 2
这与函数帮助中的示例非常相似。
数据:
df <- data.frame(v1 = c("a", "b", "a"),
v2 = c(1, 2, 2),
data = c(3.15, 4.11, 3.56))
task <- TaskRegr$new("bmsp", df, target = "data")
sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)
Matrix products: default
Random number generation:
RNG: Mersenne-Twister
Normal: Inversion
Sample: Rounding
locale:
[1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252 LC_MONETARY=English_United States.1252 LC_NUMERIC=C LC_TIME=English_United States.1252
attached base packages:
[1] stats4 parallel stats graphics grDevices utils datasets methods base
other attached packages:
[1] mlr3pipelines_0.3.0-9000 mlr3_0.7.0 Biostrings_2.56.0 XVector_0.28.0 IRanges_2.22.2 S4Vectors_0.26.1 BiocGenerics_0.34.0
loaded via a namespace (and not attached):
[1] Biobase_2.48.0 httr_1.4.2 bit64_4.0.5 splines_4.0.2 foreach_1.5.0 prodlim_2019.11.13 assertthat_0.2.1 lgr_0.3.4 askpass_1.1
[10] BiocFileCache_1.12.1 blob_1.2.1 mlr3misc_0.5.0 progress_1.2.2 ipred_0.9-9 backports_1.1.10 pillar_1.4.6 RSQLite_2.2.1 lattice_0.20-41
[19] glue_1.4.2 uuid_0.1-4 pROC_1.16.2 digest_0.6.25 checkmate_2.0.0 colorspace_1.4-1 recipes_0.1.13 Matrix_1.2-18 plyr_1.8.6
[28] timeDate_3043.102 XML_3.99-0.5 pkgconfig_2.0.3 biomaRt_2.44.1 caret_6.0-86 zlibbioc_1.34.0 purrr_0.3.4 scales_1.1.1 gower_0.2.2
[37] lava_1.6.8 tibble_3.0.3 openssl_1.4.3 generics_0.0.2 ggplot2_3.3.2 ellipsis_0.3.1 withr_2.3.0 nnet_7.3-14 paradox_0.4.0-9000
[46] survival_3.1-12 magrittr_1.5 crayon_1.3.4 memoise_1.1.0 nlme_3.1-148 MASS_7.3-51.6 class_7.3-17 tools_4.0.2 data.table_1.13.0
[55] prettyunits_1.1.1 hms_0.5.3 lifecycle_0.2.0 stringr_1.4.0 munsell_0.5.0 glmnet_4.0-2 AnnotationDbi_1.50.3 compiler_4.0.2 tinytex_0.26
[64] rlang_0.4.7 grid_4.0.2 iterators_1.0.12 rstudioapi_0.11 rappdirs_0.3.1 gtable_0.3.0 ModelMetrics_1.2.2.2 codetools_0.2-16 DBI_1.1.0
[73] curl_4.3 reshape2_1.4.4 R6_2.4.1 lubridate_1.7.9 dplyr_1.0.2 bit_4.0.4 biomartr_0.9.2 shape_1.4.5 stringi_1.5.3
[82] Rcpp_1.0.5 vctrs_0.3.4 rpart_4.1-15 dbplyr_1.4.4 tidyselect_1.1.0 xfun_0.18