如何从子目录加载tensorflow中的数据

Question

我有一个包含在本地子文件夹中的 ImageNet 数据子集，其中每个子文件夹代表一个 class 图像。可能有数百个 classes，因此有子文件夹，每个子文件夹可以包含数百个图像。下面是一个包含文件夹子集的结构示例。我想在 tensorflow 中训练一个 classification 模型，但我不确定如何格式化和加载数据，给定不同文件夹中不同图像 classes 的结构和 class 标签是文件夹的名称。通常我只使用已经存在于 tensorflow 中的数据集，如 mnist 或 cifar10，它们经过格式化且易于使用。

Answer 1

您可以使用 tf.keras.preprocessing.image_dataset_from_directory().

你的目录结构应该是这样的，但还有更多类:

main_directory/
...class_a/
......a_image_1.jpg
......a_image_2.jpg
...class_b/
......b_image_1.jpg
......b_image_2.jpg

我建议你在这一步之前分割数据集，因为我认为这里的数据是随机分割的，而不是通过分层抽样（如果你的数据集不平衡，那么先做这个，不要使用验证分割来做你，因为我不确定分裂是如何完成的，因为没有提到它。

示例：

train_dataset = image_dataset_from_directory(
    directory=TRAIN_DIR,
    labels="inferred",
    label_mode="categorical",
    class_names=["0", "10", "5"],
    image_size=SIZE,
    seed=SEED,
    subset=None,
    interpolation="bilinear",
    follow_links=False,
)

您必须设置的重要事项：

必须推断标签 其中图像的标签是根据目录结构生成的，因此它遵循类的顺序。
标签模式必须设置为“分类”，将标签编码为分类向量。
Class names 你可以自己设置这个，你必须列出目录中文件夹的顺序，否则顺序是基于关于字母数字排序。由于您有很多文件夹，您在这里可以做的是使用 os.walk(directory) 按目录顺序获取目录列表。
图像大小 您可以将图像调整为相同大小。根据您使用的模型执行此操作，即 MobileNet 接受 (224,224)，因此您可以将其设置为 (224,224)。

More information here.

Answer 2

你可以给我们ImageDataGenerator.flow_from_directory。文档是 here. 假设您的子目录位于名为 main_dir 的目录中。设置你要处理的图片的大小，下面我用的是224 X 224，也指定了彩色图片。 class_mode 设置为 'categorical' 因此在编译模型时使用分类交叉熵作为损失。然后使用下面的代码。

train_gen=ImageDataGenerator(validation_split=.2,rescale=1/255)
train_gen=train_gen.flow_from_directory(main_dir,  target_size=(256, 256),
    color_mode="rgb", class_mode="categorical", batch_size=32, shuffle=True,
    seed=123, subset='training)
valid_gen=train_gen.flow_from_directory(main_dir,  target_size=(224, 224),
    color_mode="rgb", class_mode="categorical", batch_size=32, shuffle=False,
    seed=123, subset='validation)
# make and compile your model then fit the model per below
history=model.fit(x=train_gen,  epochs=20, verbose=1, validation_data=valid_gen,
                 shuffle=True,  initial_epoch=0)

如何从子目录加载tensorflow中的数据

How to load data in tensorflow from subdirectories

tensorflow

keras

subdirectory

tensorflow-datasets

loaddata