mlr3 AutoFSelector glmnet:错误 (if(cv)glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, :# x should be a matrix with 2 or more columns

mlr3 AutoFSelector glmnet: Error in (if(cv)glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, :# x should be a matrix with 2 or more columns

我是 mlr3 的初学者,在 运行ning AutoFSelector 学习器与 glmnet 相关联的分类任务包含 >2000 个数字变量时遇到问题。 我在使用更简单的 mlr3 预定义任务声纳时重现了这个错误。 请注意,我在 macOS Monterey 12.1 上使用 R 版本 4.1.2 (2021-11-01)。所有必需的包都已加载到 CRAN 上。

library(mlr3verse)
#> Loading required package: mlr3

data("Sonar", package = "mlbench")

task <- as_task_classif(Sonar, target = "Class", positive = "R")

fselector <- fs("sequential")

terminator <- trm("evals", n_evals = 10)

afs <- AutoFSelector$new(
  learner = lrn("classif.glmnet", predict_type = "prob"),
  resampling = rsmp("cv", folds = 3),
  measure = msr("classif.ce"),
  terminator = terminator,
  fselector = fselector
)

rr <- resample(task, afs, resampling = rsmp("cv", folds = 3), store_models = TRUE)
#> INFO  [16:18:38.455] [mlr3] Applying learner 'classif.glmnet.fselector' on task 'Sonar' (iter 3/3) 
#> INFO  [16:18:38.527] [bbotk] Starting to optimize 60 parameter(s) with '<FSelectorSequential>' and '<TerminatorEvals> [n_evals=10, k=0]' 
#> INFO  [16:18:38.546] [bbotk] Evaluating 60 configuration(s) 
#> INFO  [16:18:41.674] [mlr3] Running benchmark with 180 resampling iterations 
#> INFO  [16:18:41.677] [mlr3] Applying learner 'select.classif.glmnet' on task 'Sonar' (iter 3/3)
#> Error in (if (cv) glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, : x should be a matrix with 2 or more columns
#> This happened PipeOp classif.glmnet's $train()

reprex package (v2.0.1)

于 2022-01-24 创建 会话信息
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.1.2 (2021-11-01)
#>  os       macOS Monterey 12.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       Europe/Zurich
#>  date     2022-01-24
#>  pandoc   2.17.0.1 @ /opt/homebrew/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package           * version    date (UTC) lib source
#>  assertthat          0.2.1      2019-03-21 [1] CRAN (R 4.1.2)
#>  backports           1.4.1      2021-12-13 [1] CRAN (R 4.1.1)
#>  bbotk               0.5.0      2022-01-19 [1] CRAN (R 4.1.2)
#>  checkmate           2.0.0      2020-02-06 [1] CRAN (R 4.1.2)
#>  cli                 3.1.1      2022-01-20 [1] CRAN (R 4.1.2)
#>  clue                0.3-60     2021-10-11 [1] CRAN (R 4.1.2)
#>  cluster             2.1.2      2021-04-17 [2] CRAN (R 4.1.2)
#>  clusterCrit         1.2.8      2018-07-26 [1] CRAN (R 4.1.2)
#>  codetools           0.2-18     2020-11-04 [2] CRAN (R 4.1.2)
#>  colorspace          2.0-2      2021-06-24 [1] CRAN (R 4.1.2)
#>  crayon              1.4.2      2021-10-29 [1] CRAN (R 4.1.2)
#>  data.table          1.14.2     2021-09-27 [1] CRAN (R 4.1.2)
#>  DBI                 1.1.2      2021-12-20 [1] CRAN (R 4.1.2)
#>  dictionar6          0.1.3      2021-09-13 [1] CRAN (R 4.1.2)
#>  digest              0.6.29     2021-12-01 [1] CRAN (R 4.1.2)
#>  distr6              1.6.4      2022-01-17 [1] CRAN (R 4.1.2)
#>  dplyr               1.0.7      2021-06-18 [1] CRAN (R 4.1.2)
#>  ellipsis            0.3.2      2021-04-29 [1] CRAN (R 4.1.2)
#>  evaluate            0.14       2019-05-28 [1] CRAN (R 4.1.2)
#>  fansi               1.0.2      2022-01-14 [1] CRAN (R 4.1.2)
#>  fastmap             1.1.0      2021-01-25 [1] CRAN (R 4.1.2)
#>  foreach             1.5.1      2020-10-15 [1] CRAN (R 4.1.2)
#>  fs                  1.5.2      2021-12-08 [1] CRAN (R 4.1.1)
#>  future              1.23.0     2021-10-31 [1] CRAN (R 4.1.2)
#>  future.apply        1.8.1      2021-08-10 [1] CRAN (R 4.1.2)
#>  generics            0.1.1      2021-10-25 [1] CRAN (R 4.1.2)
#>  ggplot2             3.3.5      2021-06-25 [1] CRAN (R 4.1.2)
#>  glmnet              4.1-3      2021-11-02 [1] CRAN (R 4.1.2)
#>  globals             0.14.0     2020-11-22 [1] CRAN (R 4.1.2)
#>  glue                1.6.1      2022-01-22 [1] CRAN (R 4.1.2)
#>  gtable              0.3.0      2019-03-25 [1] CRAN (R 4.1.2)
#>  highr               0.9        2021-04-16 [1] CRAN (R 4.1.2)
#>  htmltools           0.5.2      2021-08-25 [1] CRAN (R 4.1.2)
#>  iterators           1.0.13     2020-10-15 [1] CRAN (R 4.1.2)
#>  knitr               1.37       2021-12-16 [1] CRAN (R 4.1.1)
#>  lattice             0.20-45    2021-09-22 [2] CRAN (R 4.1.2)
#>  lgr                 0.4.3      2021-09-16 [1] CRAN (R 4.1.2)
#>  lifecycle           1.0.1      2021-09-24 [1] CRAN (R 4.1.2)
#>  listenv             0.8.0      2019-12-05 [1] CRAN (R 4.1.2)
#>  magrittr            2.0.1      2020-11-17 [1] CRAN (R 4.1.2)
#>  Matrix              1.4-0      2021-12-08 [1] CRAN (R 4.1.1)
#>  mlr3              * 0.13.1     2022-01-19 [1] CRAN (R 4.1.2)
#>  mlr3cluster         0.1.2      2021-09-03 [1] CRAN (R 4.1.2)
#>  mlr3data            0.5.0      2021-06-29 [1] CRAN (R 4.1.2)
#>  mlr3extralearners   0.5.18     2022-01-23 [1] Github (mlr-org/mlr3extralearners@54fa488)
#>  mlr3filters         0.4.2.9000 2021-12-05 [1] local
#>  mlr3fselect         0.6.1      2022-01-20 [1] CRAN (R 4.1.2)
#>  mlr3learners        0.5.2      2022-01-23 [1] CRAN (R 4.1.2)
#>  mlr3measures        0.4.1      2022-01-13 [1] CRAN (R 4.1.2)
#>  mlr3misc            0.10.0     2022-01-11 [1] CRAN (R 4.1.2)
#>  mlr3pipelines       0.4.0      2021-11-15 [1] CRAN (R 4.1.2)
#>  mlr3proba           0.4.3      2022-01-22 [1] CRAN (R 4.1.2)
#>  mlr3tuning          0.10.0     2022-01-20 [1] CRAN (R 4.1.2)
#>  mlr3verse         * 0.2.2      2021-08-11 [1] CRAN (R 4.1.2)
#>  mlr3viz             0.5.7      2021-10-14 [1] CRAN (R 4.1.2)
#>  munsell             0.5.0      2018-06-12 [1] CRAN (R 4.1.2)
#>  ooplah              0.2.0      2022-01-21 [1] CRAN (R 4.1.2)
#>  palmerpenguins      0.1.0      2020-07-23 [1] CRAN (R 4.1.2)
#>  paradox             0.7.1      2021-03-07 [1] CRAN (R 4.1.2)
#>  parallelly          1.30.0     2021-12-17 [1] CRAN (R 4.1.1)
#>  param6              0.2.3      2021-10-05 [1] CRAN (R 4.1.2)
#>  pillar              1.6.4      2021-10-18 [1] CRAN (R 4.1.2)
#>  pkgconfig           2.0.3      2019-09-22 [1] CRAN (R 4.1.2)
#>  purrr               0.3.4      2020-04-17 [1] CRAN (R 4.1.2)
#>  R.cache             0.15.0     2021-04-30 [1] CRAN (R 4.1.2)
#>  R.methodsS3         1.8.1      2020-08-26 [1] CRAN (R 4.1.2)
#>  R.oo                1.24.0     2020-08-26 [1] CRAN (R 4.1.2)
#>  R.utils             2.11.0     2021-09-26 [1] CRAN (R 4.1.2)
#>  R6                  2.5.1      2021-08-19 [1] CRAN (R 4.1.2)
#>  Rcpp                1.0.8      2022-01-13 [1] CRAN (R 4.1.2)
#>  reprex              2.0.1      2021-08-05 [1] CRAN (R 4.1.2)
#>  rlang               1.0.0      2022-01-22 [1] Github (r-lib/rlang@f2fbaad)
#>  rmarkdown           2.11       2021-09-14 [1] CRAN (R 4.1.2)
#>  rstudioapi          0.13       2020-11-12 [1] CRAN (R 4.1.2)
#>  scales              1.1.1      2020-05-11 [1] CRAN (R 4.1.2)
#>  sessioninfo         1.2.2      2021-12-06 [1] CRAN (R 4.1.2)
#>  set6                0.2.4      2021-10-18 [1] CRAN (R 4.1.2)
#>  shape               1.4.6      2021-05-19 [1] CRAN (R 4.1.2)
#>  stringi             1.7.6      2021-11-29 [1] CRAN (R 4.1.2)
#>  stringr             1.4.0      2019-02-10 [1] CRAN (R 4.1.2)
#>  styler              1.6.2      2021-09-23 [1] CRAN (R 4.1.1)
#>  survival            3.2-13     2021-08-24 [2] CRAN (R 4.1.2)
#>  tibble              3.1.6      2021-11-07 [1] CRAN (R 4.1.2)
#>  tidyselect          1.1.1      2021-04-30 [1] CRAN (R 4.1.2)
#>  utf8                1.2.2      2021-07-24 [1] CRAN (R 4.1.2)
#>  uuid                1.0-3      2021-11-01 [1] CRAN (R 4.1.2)
#>  vctrs               0.3.8      2021-04-29 [1] CRAN (R 4.1.2)
#>  withr               2.4.3      2021-11-30 [1] CRAN (R 4.1.2)
#>  xfun                0.29       2021-12-14 [1] CRAN (R 4.1.1)
#>  yaml                2.2.1      2020-02-01 [1] CRAN (R 4.1.2)
#> 
#>  [1] /Users/pjs/Library/R/arm64/4.1/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.1-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

我得到的错误如下所示并且特定于 glmnet,因为我可以有效地 运行 使用替代学习器的类似代码,例如 classif.rpart。

Erreur dans (if (cv) glmnet::cv.glmnet else glmnet::glmnet)(x = data, y = target, : x should be a matrix with 2 or more columns This happened PipeOp classif.glmnet's $train()

有人知道我该如何解决这个问题吗?

非常感谢您的帮助,

克莱门斯

这是 glmnet 特有的问题。 glmnet 需要至少两个特征来拟合一个模型,但在至少一个配置中(顺序向前搜索中的第一个配置)你只有一个特征。

解决这个问题有两种可能:

  1. 在 mlr3fselect 中打开一个问题并请求一个新参数 min_features(已经有 max_features)以便能够开始搜索 2 个或更多功能。
  2. 用回退来扩充基础学习器,如果基础学习器失败,则该回退会被安装。这是一个简单的逻辑回归的回退:
    learner = lrn("classif.glmnet", predict_type = "prob")
    learner$encapsulate = c(train = "evaluate", predict = "evaluate")
    learner$fallback = lrn("classif.log_reg", predict_type = "prob")
    
    然后把这个learner传给AutoFselector.