为什么我的某些图像在 Tensorflow 中被忽略,而我的图像在同一文件夹中没有被忽略?

Why some of my images are being ignored in Tensorflow when i have images not being ignored in the same folders?

我在 2 个文件夹中有数据(使用 10000 张图像进行训练,使用 1000 张图像进行验证)并且在这些文件夹中的每一个中我都有 10 个文件夹(各自的 类)。 我将所有这些数据放入数据框中以备后用。 事实证明,当我在 Tensorflow 中使用“flow_from_dataframe”时,某些文件夹中的某些图像被假定为具有无效名称,因此被忽略。

并且我尝试访问 Tensorflow 之外的任何图像,例如通过简单地打开图像,但当路径完全正确时我仍然无法访问某些文件

from PIL import Image
im = Image.open("D:\Ensino Superior\ISCTE-IUL\Mestrado em Engenharia Informatica\Tese\Testagem\TomatoLeafDisease\data\Train\Tomato___Tomato_mosaic_virus\Tomato___Tomato_mosaic_virus_original_f16eeb0f-5219-4a81-9941-351b3d9ba5fc___PSU_CG 2089.JPG_a88e521f-cec2-4755-871f-782de8192056.JPG") 
im.show() 

Output when trying to acess an image outside Tensorflow code

The dataframe

我已经研究并发现使用 abs 路径可以帮助并使其工作,但是即使这样一些图像也会被忽略,我该怎么做才能不忽略任何图像?

错误输出前的代码:

def create_gen():
    train_generator = tf.keras.preprocessing.image.ImageDataGenerator(
        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
        validation_split=0.2
    )
    test_generator = tf.keras.preprocessing.image.ImageDataGenerator(
        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input
    )

    train_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='training',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"
    )

    val_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='validation',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"
    )

    test_images = test_generator.flow_from_dataframe(
        dataframe=validation_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=False
    )

    return train_generator,test_generator,train_images

pretrained_model = tf.keras.applications.MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights='imagenet',
pooling='avg'

train_generator,test_generator,train_images,val_images,test_images = create_gen()

Output after using "flow_from_data_frame" with Tensorflow

您的生成器不正确。例如你有这个代码。

 train_images = train_generator.flow_from_dataframe(
        dataframe=train_df,
        x_col='Filepath',
        y_col='Class',
        target_size=(224, 224),
        color_mode='rgb',
        class_mode='categorical',
        batch_size=32,
        shuffle=True,
        seed=0,
        subset='training',
        rotation_range=30, # Uncomment to use data augmentation
        zoom_range=0.15,
        width_shift_range=0.2,
        height_shift_range=0.2,
        shear_range=0.15,
        horizontal_flip=True,
        fill_mode="nearest"

参数 rotation_range、width_shift_range、height_shift_range、shear_range、horizontal_flip 和 fill_mode 不属于 .flow_from_dataframe.它们应该放在 ImageDataGenerator 中,如下所示

train_generator = tf.keras.preprocessing.image.ImageDataGenerator(        preprocessing_function=tf.keras.applications.mobilenet_v2.preprocess_input,
        validation_split=0.2, rotation_range=30, zoom_range=0.15,
        width_shift_range=0.2,  height_shift_range=0.2,  shear_range=0.15,
        horizontal_flip=True, fill_mode="nearest" )

对于图像路径的问题尝试运行这个。我使用 sdir 作为包含您的数据的目录,只需替换正确的目录名称即可。

import os
from PIL import Image
sdir=r'C:\Temp\balls\train' # set this to you directory
good_image_count=0
bad_image_count=0
processed_count=0
bad_image_list=[]
classlist=os.listdir(sdir)
for klass in classlist:
    print ('processing class directory ', klass)
    classpath=os.path.join(sdir, klass)
    if os.path.isdir(classpath):
        flist=os.listdir(classpath)
        print (' number of files in class directory ', klass,' is ', len(flist))
        for f in flist:
            processed_count +=1
            fpath=os.path.join(classpath, f)
            if os.path.isfile(fpath):
                try:
                    image=Image.open(fpath)
                    image.verify()
                    good_image_count +=1
                except:
                    bad_image_count +=1
                    bad_image_list.append(fpath)
            else:
                print('in class directory ', klass, ' you have sub directories, you should only have files in it')
    else:
        print ('In the sdir you have files, you should only have class subdirectories in sdir')
print(processed_count, ' files were processed') 
print(good_image_count, ' files were valid image files')
print(bad_image_count, ' files were invalid images or paths did not exist')
for f in bad_image_list:
    print (f)

整理完毕。这是因为图片路径和图片名称本身太长了。