如何在管道中重命名 mlr3 任务特征值

How to rename mlr3 task feature values within pipeline

我有一个 mlr3 任务

df <- data.frame(v1 = c("a", "b", "a"),
          v2 = c(1, 2, 2),
          data = c(3.15, 4.11, 3.56))

library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")

如何在管道中将功能“v1”的值“a”重命名为值“c”?

代码:

library(mlr3)
library(mlr3pipelines)

df <- data.frame(v1 = c("a", "b", "a"),
                 v2 = c(1, 2, 2),
                 data = c(3.15, 4.11, 3.56))

library(mlr3)
task <- TaskRegr$new("bmsp", df, target = "data")


pop <- po("colapply",
          applicator =  function(x) ifelse(x == "a", "c", x))


pop$param_set$values$affect_columns = selector_name("v1")

pop$train(list(task))[[1]]$data()

给出输出(见第 v1 列,第 2 行):

  data v1 v2
1 3.15 c  1 
2 4.11 2  2 
3 3.56 c  2 

但需要输出

  data v1 v2
1 3.15 c  1 
2 4.11 b  2 
3 3.56 c  2 

使用 PipeOpColApply.

非常简单

我们需要定义一个函数,它将接受提供的输入并执行请求的操作(应用程序)。

library(mlr3)
library(mlr3pipelines)

pop <- po("colapply",
          applicator =  function(x) ifelse(x == "a", "c", x))

我们还需要定义函数将在哪些列上运行:

pop$param_set$values$affect_columns = selector_name("v1")

pop$train(list(task))[[1]]$data()
#output
  data v1 v2
1: 3.15  c  1
2: 4.11  b  2
3: 3.56  c  2

这与函数帮助中的示例非常相似。

数据:

df <- data.frame(v1 = c("a", "b", "a"),
          v2 = c(1, 2, 2),
          data = c(3.15, 4.11, 3.56))

task <- TaskRegr$new("bmsp", df, target = "data")


sessionInfo()
R version 4.0.2 (2020-06-22)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18362)

Matrix products: default

Random number generation:
 RNG:     Mersenne-Twister 
 Normal:  Inversion 
 Sample:  Rounding 
 
locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] mlr3pipelines_0.3.0-9000 mlr3_0.7.0               Biostrings_2.56.0        XVector_0.28.0           IRanges_2.22.2           S4Vectors_0.26.1         BiocGenerics_0.34.0     

loaded via a namespace (and not attached):
 [1] Biobase_2.48.0       httr_1.4.2           bit64_4.0.5          splines_4.0.2        foreach_1.5.0        prodlim_2019.11.13   assertthat_0.2.1     lgr_0.3.4            askpass_1.1         
[10] BiocFileCache_1.12.1 blob_1.2.1           mlr3misc_0.5.0       progress_1.2.2       ipred_0.9-9          backports_1.1.10     pillar_1.4.6         RSQLite_2.2.1        lattice_0.20-41     
[19] glue_1.4.2           uuid_0.1-4           pROC_1.16.2          digest_0.6.25        checkmate_2.0.0      colorspace_1.4-1     recipes_0.1.13       Matrix_1.2-18        plyr_1.8.6          
[28] timeDate_3043.102    XML_3.99-0.5         pkgconfig_2.0.3      biomaRt_2.44.1       caret_6.0-86         zlibbioc_1.34.0      purrr_0.3.4          scales_1.1.1         gower_0.2.2         
[37] lava_1.6.8           tibble_3.0.3         openssl_1.4.3        generics_0.0.2       ggplot2_3.3.2        ellipsis_0.3.1       withr_2.3.0          nnet_7.3-14          paradox_0.4.0-9000  
[46] survival_3.1-12      magrittr_1.5         crayon_1.3.4         memoise_1.1.0        nlme_3.1-148         MASS_7.3-51.6        class_7.3-17         tools_4.0.2          data.table_1.13.0   
[55] prettyunits_1.1.1    hms_0.5.3            lifecycle_0.2.0      stringr_1.4.0        munsell_0.5.0        glmnet_4.0-2         AnnotationDbi_1.50.3 compiler_4.0.2       tinytex_0.26        
[64] rlang_0.4.7          grid_4.0.2           iterators_1.0.12     rstudioapi_0.11      rappdirs_0.3.1       gtable_0.3.0         ModelMetrics_1.2.2.2 codetools_0.2-16     DBI_1.1.0           
[73] curl_4.3             reshape2_1.4.4       R6_2.4.1             lubridate_1.7.9      dplyr_1.0.2          bit_4.0.4            biomartr_0.9.2       shape_1.4.5          stringi_1.5.3       
[82] Rcpp_1.0.5           vctrs_0.3.4          rpart_4.1-15         dbplyr_1.4.4         tidyselect_1.1.0     xfun_0.18