如何对目录中的 keras 图像数据集使用交叉验证?

How to use cross-validation with keras image datasets from directories?

我在 keras 中有一个图像数据集,我直接从相应的函数在训练和测试之间单独加载:

from tensorflow import keras

tds = keras.preprocessing\
    .image_dataset_from_directory('dataset_folder', seed=123,
                                  validation_split=0.35, subset='training')

vds = keras.preprocessing\
    .image_dataset_from_directory('dataset_folder', seed=123,
                                  validation_split=0.35, subset='validation')

然后我会经历我的神经网络的通常阶段:

from tensorflow.keras import layers
from tensorflow.keras.models import Sequential

num_classes = 5

model = Sequential([
    layers.experimental.preprocessing.Rescaling(1.0/255,
                                                input_shape=(256, 256, 3)),
    layers.Conv2D(16, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Conv2D(32, 3, padding='same', activation='relu'),
    layers.MaxPooling2D(),
    layers.Flatten(),
    layers.Dense(64, activation='relu'),
    layers.Dense(num_classes)])

model\
    .compile(optimizer='adam', metrics=['accuracy'],
             loss=keras.losses.SparseCategoricalCrossentropy(from_logits=True))

hist = model.fit(tds, validation_data=vds, epochs=15)

如何在 sklearn.model_selection 中使用 KFoldStratifiedKFold 实施交叉验证?如果为了能够做到这一点,我必须改变数据的加载方式,我也很高兴知道如何去做。

查看这些关于在 Keras 中实施交叉验证的建议:

https://machinelearningmastery.com/evaluate-performance-deep-learning-models-keras/

使用 image_dataset_from_directory 加载数据将生成一个 tf.data.dataset 对象,我不确定它是否有助于上述实现。一种替代方法是将图像转换为 Numpy 数组,然后可以通过 K-fold 对其进行处理。为此,您可以参考以下内容:

注:上面给出的机器学习掌握link中提到了以下语句:

Cross validation is often not used for evaluating deep learning models because of the greater computational expense. For example k-fold cross validation is often used with 5 or 10 folds. As such, 5 or 10 models must be constructed and evaluated, greatly adding to the evaluation time of a model.