自定义 ML 函数不起作用:选择了未定义的列
Custom ML function not working: undefined columns selected
我正在尝试使用 caTools
包编写一个自定义函数来执行基于逻辑回归的 ML,但我不断收到错误消息:undefined columns selected
。
我检查了 xlearn
的输入和 logit_boost
函数的 ylearn
参数,如文档中所述,它们分别是包含特征和标签向量的数据框。所以不确定我做错了什么。
# needed libraries
library(dplyr)
library(rlang)
library(caTools)
# function body
logit_boost <- function(data, x, y, split_size = 0.8) {
# creating a dataframe
data <-
dplyr::select(.data = data,
!!rlang::enquo(x),
!!rlang::enquo(y))
# for reproducibility
set.seed(123)
# creating indices to choose rows from the data
train_indices <-
base::sample(x = base::seq_len(length.out = nrow(data)),
size = floor(split_size * nrow(data)))
# training dataset
train <- data[train_indices, ]
# testing dataset
test <- data[-train_indices, ]
# defining label column we are interested in and everything else
label_train <-
train %>% dplyr::select(.data = ., !!rlang::enquo(x))
data_train <-
train %>% dplyr::select(.data = ., -!!rlang::enquo(x))
# training model (y ~ x)
logit_model <-
caTools::LogitBoost(xlearn = data_train,
ylearn = label_train)
# prediction
# stats::predict(object = logit_model, test, type = "raw")
}
logit_boost(data = mtcars, x = am, y = mpg)
#> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)): undefined columns selected
在 help(LogitBoost)
示例部分中,Label = iris[, 5]
生成一个向量,正如 LogitBoost()
的 ylearn
参数中预期的那样。
在您的代码中,label_train <- train %>% dplyr::select(.data = ., !!rlang::enquo(x))
导致 data.frame。根据设计,dplyr
在仅选择一列时默认为 drop = FALSE
(甚至忽略参数)。
我们可以做到:
logit_model <- caTools::LogitBoost(xlearn = data_train, ylearn = dplyr::pull(label_train))
我正在尝试使用 caTools
包编写一个自定义函数来执行基于逻辑回归的 ML,但我不断收到错误消息:undefined columns selected
。
我检查了 xlearn
的输入和 logit_boost
函数的 ylearn
参数,如文档中所述,它们分别是包含特征和标签向量的数据框。所以不确定我做错了什么。
# needed libraries
library(dplyr)
library(rlang)
library(caTools)
# function body
logit_boost <- function(data, x, y, split_size = 0.8) {
# creating a dataframe
data <-
dplyr::select(.data = data,
!!rlang::enquo(x),
!!rlang::enquo(y))
# for reproducibility
set.seed(123)
# creating indices to choose rows from the data
train_indices <-
base::sample(x = base::seq_len(length.out = nrow(data)),
size = floor(split_size * nrow(data)))
# training dataset
train <- data[train_indices, ]
# testing dataset
test <- data[-train_indices, ]
# defining label column we are interested in and everything else
label_train <-
train %>% dplyr::select(.data = ., !!rlang::enquo(x))
data_train <-
train %>% dplyr::select(.data = ., -!!rlang::enquo(x))
# training model (y ~ x)
logit_model <-
caTools::LogitBoost(xlearn = data_train,
ylearn = label_train)
# prediction
# stats::predict(object = logit_model, test, type = "raw")
}
logit_boost(data = mtcars, x = am, y = mpg)
#> Error in `[.data.frame`(x, order(x, na.last = na.last, decreasing = decreasing)): undefined columns selected
在 help(LogitBoost)
示例部分中,Label = iris[, 5]
生成一个向量,正如 LogitBoost()
的 ylearn
参数中预期的那样。
在您的代码中,label_train <- train %>% dplyr::select(.data = ., !!rlang::enquo(x))
导致 data.frame。根据设计,dplyr
在仅选择一列时默认为 drop = FALSE
(甚至忽略参数)。
我们可以做到:
logit_model <- caTools::LogitBoost(xlearn = data_train, ylearn = dplyr::pull(label_train))