为什么我的某些图像在 Tensorflow 中被忽略，而我的图像在同一文件夹中没有被忽略？

Question

我在 2 个文件夹中有数据（使用 10000 张图像进行训练，使用 1000 张图像进行验证）并且在这些文件夹中的每一个中我都有 10 个文件夹（各自的类）。我将所有这些数据放入数据框中以备后用。事实证明，当我在 Tensorflow 中使用“flow_from_dataframe”时，某些文件夹中的某些图像被假定为具有无效名称，因此被忽略。

并且我尝试访问 Tensorflow 之外的任何图像，例如通过简单地打开图像，但当路径完全正确时我仍然无法访问某些文件

from PIL import Image
im = Image.open("D:\Ensino Superior\ISCTE-IUL\Mestrado em Engenharia Informatica\Tese\Testagem\TomatoLeafDisease\data\Train\Tomato___Tomato_mosaic_virus\Tomato___Tomato_mosaic_virus_original_f16eeb0f-5219-4a81-9941-351b3d9ba5fc___PSU_CG 2089.JPG_a88e521f-cec2-4755-871f-782de8192056.JPG") 
im.show()

Output when trying to acess an image outside Tensorflow code

The dataframe

我已经研究并发现使用 abs 路径可以帮助并使其工作，但是即使这样一些图像也会被忽略，我该怎么做才能不忽略任何图像？

错误输出前的代码：

def create_gen():
    train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
        validation_split=0.2
    )
    test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input
    )

    train_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='training',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"
    )

    val_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='validation',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"
    )

    test_images = test_generator.flow_from_dataframe(
        dataframe=validation_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=False
    )

    return train_generator,test_generator,train_images

pretrained_model = tf.keras.applications.MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg'

train_generator,test_generator,train_images,val_images,test_images = create_gen()

Output after using "flow_from_data_frame" with Tensorflow

Answer 1

您的生成器不正确。例如你有这个代码。

 train_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='training',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"

参数 rotation_range、width_shift_range、height_shift_range、shear_range、horizontal_flip 和 fill_mode 不属于 .flow_from_dataframe.它们应该放在 ImageDataGenerator 中，如下所示

train_generator = tf.keras.preprocessing.image.ImageDataGenerator(        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
        validation_split=0.2, rotation_range=30, zoom_range=0.15,
        width_shift_range=0.2,  height_shift_range=0.2,  shear_range=0.15,
        horizontal_flip=True, fill_mode="nearest" )

对于图像路径的问题尝试运行这个。我使用 sdir 作为包含您的数据的目录，只需替换正确的目录名称即可。

import os
from PIL import Image
sdir=r'C:\Temp\balls\train' # set this to you directory
good_image_count=0
bad_image_count=0
processed_count=0
bad_image_list=[]
classlist=os.listdir(sdir)
for klass in classlist:
    print ('processing class directory ', klass)
    classpath=os.path.join(sdir, klass)
    if os.path.isdir(classpath):
        flist=os.listdir(classpath)
        print (' number of files in class directory ', klass,' is ', len(flist))
        for f in flist:
            processed_count +=1
            fpath=os.path.join(classpath, f)
            if os.path.isfile(fpath):
                try:
                    image=Image.open(fpath)
                    image.verify()
                    good_image_count +=1
                except:
                    bad_image_count +=1
                    bad_image_list.append(fpath)
            else:
                print('in class directory ', klass, ' you have sub directories, you should only have files in it')
    else:
        print ('In the sdir you have files, you should only have class subdirectories in sdir')
print(processed_count, ' files were processed') 
print(good_image_count, ' files were valid image files')
print(bad_image_count, ' files were invalid images or paths did not exist')
for f in bad_image_list:
    print (f)

Answer 2

整理完毕。这是因为图片路径和图片名称本身太长了。

为什么我的某些图像在 Tensorflow 中被忽略，而我的图像在同一文件夹中没有被忽略？

Why some of my images are being ignored in Tensorflow when i have images not being ignored in the same folders?

python

machine-learning

pandas

tensorflow

tf.keras