当对新的样本外数据调用 .score 或 .predict 时，"tpot" 模型对象是否会自动应用任何缩放或其他转换？

Question

下面是在 TPOT 中训练模型的基本代码：

from tpot import TPOTClassifier
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split

digits = load_digits()
X_train, X_test, y_train, y_test = train_test_split(digits.data, digits.target,
                                                    train_size=0.75, test_size=0.25, random_state=42)

tpot = TPOTClassifier(generations=5, population_size=50, verbosity=2, random_state=42)
tpot.fit(X_train, y_train)
print(tpot.score(X_test, y_test))

最后，它对测试集上的数据进行评分，而没有明确地对训练集进行转换。这里有几个问题。

当对新的样本外数据调用 .score 或 .predict 时，“tpot”模型对象是否会自动应用任何缩放或其他转换？
如果不是，在调用 .score .predict 之前对测试集执行转换的正确方法是什么。

如果我完全误解了这一点，请多多指教。谢谢。

Answer 1

Does the "tpot" model object automatically apply any scaling or other transformations when .score or .predict is called on new out-of-sample data?

这取决于 TPOT 选择的最终管道。但是，如果 TPOT 选择的最终管道具有任何类型的数据缩放或转换，那么它也会在 predict 和 score 函数中正确应用这些缩放和转换操作。

这是因为，在幕后，TPOT 正在优化 scikit-learn Pipeline objects。

也就是说，如果您希望保证对数据进行特定的转换，那么您有几个选择：

您可以将您的数据拆分为训练和测试，在训练集上学习转换（例如，StandardScaler），然后也将其应用于您的测试集。在将数据传递给 TPOT 之前，您将执行这两项操作。
您可以使用 TPOT's template functionality，它允许您指定对分析管道的外观的约束。

当对新的样本外数据调用 .score 或 .predict 时，"tpot" 模型对象是否会自动应用任何缩放或其他转换？

Does the "tpot" model object automatically apply any scaling or other transformations when .score or .predict is called on new out-of-sample data?

machine-learning

automl

tpot