自定义 ML 函数不起作用：选择了未定义的列

Question

我正在尝试使用 caTools 包编写一个自定义函数来执行基于逻辑回归的 ML，但我不断收到错误消息：undefined columns selected。

我检查了 xlearn 的输入和 logit_boost 函数的 ylearn 参数，如文档中所述，它们分别是包含特征和标签向量的数据框。所以不确定我做错了什么。

# needed libraries
library(dplyr)
library(rlang)
library(caTools)

# function body
logit_boost <- function(data, x, y, split_size = 0.8) {
  # creating a dataframe
  data <-
    dplyr::select(.data = data,
                  !!rlang::enquo(x),
                  !!rlang::enquo(y))

  # for reproducibility
  set.seed(123)

  # creating indices to choose rows from the data
  train_indices <-
    base::sample(x = base::seq_len(length.out = nrow(data)),
                 size = floor(split_size * nrow(data)))

  # training dataset
  train <- data[train_indices, ]

  # testing dataset
  test <- data[-train_indices, ]

  # defining label column we are interested in and everything else
  label_train <-
    train %>% dplyr::select(.data = ., !!rlang::enquo(x))

  data_train <-
    train %>% dplyr::select(.data = ., -!!rlang::enquo(x))

  # training model (y ~ x)
  logit_model <-
    caTools::LogitBoost(xlearn = data_train,
                        ylearn = label_train)

  # prediction
  # stats::predict(object = logit_model, test, type = "raw")
}

logit_boost(data = mtcars, x = am, y = mpg)
#> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)): undefined columns selected

Answer 1

在 help(LogitBoost) 示例部分中，Label = iris[, 5] 生成一个向量，正如 LogitBoost() 的 ylearn 参数中预期的那样。

在您的代码中，label_train <- train %>% dplyr::select(.data = ., !!rlang::enquo(x)) 导致 data.frame。根据设计，dplyr 在仅选择一列时默认为 drop = FALSE（甚至忽略参数）。

我们可以做到：

logit_model <- caTools::LogitBoost(xlearn = data_train, ylearn = dplyr::pull(label_train))

自定义 ML 函数不起作用：选择了未定义的列

Custom ML function not working: undefined columns selected

r

machine-learning

rlang