如何修复 PipeOP 的状态？

Question

我们如何修复 PipeOp 的 $state，使其参数或配置从一开始就固定，并在训练和预测中保持相同。

task = tsk("iris")
pos1 = po("scale", param_vals =list(
    center = T,
    scale = T,
    affect_columns = selector_name("Sepal.Width")))

pos1$state
pos1$state$center <- c(Sepal.Width = 0) 
pos1$state$scale <- c(Sepal.Width = 2) 
 
graph <- pos1 %>>% lrn("classif.xgboost", eval_metric = "mlogloss")
gl <- GraphLearner$new(graph)
gl$train(task)
gl$state

在上面的代码中，po("scale") 中的参数 center 和 scale 是根据数据重新计算的，即使我试图将它们固定为零和二（不确定是否我做对了），分别。

Answer 1

A PipeOp 的 $state 永远不应手动更改。也就是说，它更像是一个供您检查的日志记录槽，PipeOp 可以在其中找到在训练后执行预测步骤所需的所有信息。

PipeOpScale 将始终将训练数据缩放为均值 0 并按它们的 root-mean-square 缩放它们（参见 ?scale）并存储“学习”参数（即均值和 root-mean-square 的训练数据，例如 scale 函数返回的属性）作为 $state。在预测期间，数据将被类似地转换，导致可能不同的均值和 root-mean-square.

假设你想在训练和预测期间将 "Sepal.Width" 缩放为 0 和 root-mean-square 2 both（如上面代码所建议的；但这可能是个坏主意），你可以使用 PipeOpColApply:

f = function(x) {
  scale(x)[, 1] * 2 + 0
}

task = tsk("iris")
pos = po("colapply", applicator = f, affect_columns = selector_name("Sepal.Width"))

train_out = pos$train(list(task))[[1]]$data(cols = task$feature_names)
round(colMeans(train_out), 2)
round(apply(train_out, MARGIN = 2, FUN = sd), 2)

pos$state

如何修复 PipeOP 的状态？

How to fix PipeOP's state?

mlr3