为什么在应用选定特征时，准确度和 F1 等预测指标没有提高？

Question

我正在通过监督方法使用 MLR 包构建模型。我遵循的步骤是

1) 清理数据
2）应用特征选择（Correlation based Feature Selection）
3) 使用 MLR 包进行预测

library(mlr)
mlr_data <- as.data.frame(scale(df_allF[,res.cfs]))
mlr_data$label <- factor(df_allF$label)

NAN_col <- sapply(mlr_data, function(x) all(is.nan(x)))
mlr_data <- mlr_data[,!NAN_col]

task <- makeClassifTask(data = mlr_data,target = "label")
task <- normalizeFeatures(task,method = "standardize")
lrn = makeLearner("classif.rpart", predict.type = "prob")
rdesc = makeResampleDesc("LOO")
rin = makeResampleInstance(rdesc, task)
#Search for hyperparameters
ps <- makeParamSet(
makeIntegerParam("minsplit",lower = 10, upper = 50),
makeIntegerParam("minbucket", lower = 5, upper = 50),
makeNumericParam("cp", lower = 0.001, upper = 0.2)
)
ctrl1 = makeTuneControlRandom(tune.threshold = TRUE)
lrn1 = tuneParams(lrn, resampling = rdesc,task = task, measures = acc, par.set = ps, control = ctrl1)

rpart.tree <- setHyperPars(lrn, par.vals = lrn1$x)

t.rpart <- train(rpart.tree, task)
getLearnerModel(t.rpart)

tpmodel <- predict(t.rpart, task)

cat("\nConfusion Matrix before setting Threshold:\n ")
calculateConfusionMatrix(tpmodel)

threshold.update <- lrn1$threshold
tpmodel <- setThreshold(tpmodel,threshold.update)

cat("\nConfusion Matrix after setting Threshold:\n ")
calculateConfusionMatrix(tpmodel)


cat("\nMeasures : ")
m1 <- performance(tpmodel, acc)
m2 <- measureF1(tpmodel$data$truth,tpmodel$data$response,"Healthy")

cat("F1 = ",m2,"Accuracy = ",m1)

时F1和Accuracy的结果

具有所有特征的数据集
F1 = 0.923，精度 = 0.928
具有选定特征的数据集(CFS)
F1 = 0.863，精度 = 0.857
使用信息增益选择特征的数据集
F1 = 0.8947，精度 = 0.904

在这里，结果没有改善。整个数据集包含 154 个特征和 42 列。

我对此有解决方案或原因吗？我尝试了大部分功能选择 method.But 没有改进。

Answer 1

您正在使用的特征 selection 方法不会将分类器的性能用于 select 特征。为此，请使用明确考虑性能的 wrapper method。也就是说，不能保证功能 selection 会提高性能。

为什么在应用选定特征时，准确度和 F1 等预测指标没有提高？

Why does the prediction measures like accuracy and F1 does not improve when applied with selected Features?

performance

r

measure

feature-selection

mlr